KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Eric Engestrom	2c67457e5e	util/list: rename LIST_ENTRY() to list_entry() This follows the Linux kernel convention, and avoids collision with macOS header macro. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6751 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6840 Cc: mesa-stable Signed-off-by: Eric Engestrom <eric@igalia.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Reviewed-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17772>	2022-07-28 10:10:44 +00:00
Icecream95	a8dbf61b46	panfrost: Add a debug option for checking overflows on pool uploads PAN_MESA_DEBUG=overflow will place objects as close as possible to a protected region at the end of the buffer, so that overflows segfault. Caught the bugs in all four of the preceding commits. v2: memset the BO to 0xbb to catch code expecting zeroed allocations. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>	2022-07-23 00:56:10 +00:00
Icecream95	379ae6d823	panfrost: Emit the correct number of attributes create_vertex_elements_state is sometimes called with a too large num_elements argument, for example with util_blitter, which causes a buffer overflow. There is no documentation to forbid this practice, so don't rely on so->num_elements being correct and instead use the vertex shader attribute count, which matches the value used to allocate the descriptors. Use attributes_read_count rather than attribute_count because the latter also includes images and PAN_VERTEX_ID/PAN_INSTANCE_ID. Fixes: `76de3e691c` ("panfrost: Merge attribute packing routines") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>	2022-07-23 00:56:10 +00:00
Jason Ekstrand	b510ee0d22	Use vk_foreach_struct_const where needed We're about to make it so that the compiler warns/errors if you use the wrong iterator macro. Fix up a bunch of places where someone used the wrong one before we break anything. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17630>	2022-07-19 19:55:17 +00:00
Alyssa Rosenzweig	3a0a8688d3	panfrost: Use early-ZS helpers Remove the previous compile-time early-ZS implementation and replace it with the decoupled early-ZS implementation. This uses more efficient settings in some cases (depth/stencil tests always passes or do not write), and fixes the settings used in another case (alpha-to-coverage enabled with an otherwise early-ZS shader.) Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Closes: #6206 Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>	2022-07-13 21:05:35 +00:00
Alyssa Rosenzweig	fe875c0144	panfrost: Unit test early-ZS helpers The new early-ZS helpers are pure functions, leaf nodes of the call graph, and implemented with a different algorithm from the "oracle" table of correct values for various combinations of states. Further, incorrect settings often still pass CTS while causing game bugs or inefficiencies. That combination makes the helpers an excellent candidate for unit tests. Add some. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>	2022-07-13 21:05:35 +00:00
Alyssa Rosenzweig	e96292bc07	panfrost: Add decoupled early-ZS helpers Bifrost (and Valhall) separate early-ZS configuration into two fields: when does the depth/stencil buffer update happen? and when are pixels killed by the depth/stencil tests? The driver separately configures these to occur early (before the shader executes) or late (after the ATEST instruction executes at the end of the shader). Early tests are generally more efficient, but various combinations of API state and fragment shader properties can require late updates and/or late kills for correctness. Determining how to configure these fields is nontrivial. Our current implementation (on Bifrost) configures these fields at fragment shader compile time and bakes the settings into the RSD. This is both wrong (using early testing when late testing is required) and suboptimal (using late testing when early testing would suffice). We need to defer this configuration until draw time, when we know rasterizer and Z/S state. Reclassifying at draw time (as we currently do on Valhall) would be expensive, especially with the extra terms added in here. To cope, decouple the shader classification from the draw-time configuration. Since there are only a few bits of draw state involved, this implementation just calculates all possible states. Then the draw time classification is just indexing into a lookup table. The actual algorithm used to classify is written with correctness and clarity in mind. Unlike the current classification algorithm (which tries to match what the DDK does, poorly), this algorithm embeds its proofs of correctness. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>	2022-07-13 21:05:35 +00:00
Alyssa Rosenzweig	29c33f75d3	pan/va: Stall after ATEST In theory this wait is required for correct behaviour of discarded threads with ATEST. Mesa usually waits before the instruction after ATEST, so this wait will get optimized out by va_merge_flow, but as our scheduler gets more sophisticated this could become an issue. Let's stay on the safe side and insert the recommended wait. No shader-db changes. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>	2022-07-13 21:05:35 +00:00
Alyssa Rosenzweig	db2bdc1dc3	pan/bi: Require ATEST coverage mask input in R60 In theory, ATEST can take any combination of registers for inputs. Experimentally, however, ATEST requires the coverage mask in R60. This avoids regressing the following dEQP tests, which write their coverage mask with pixel-frequency-shading but without writing to the depth/stencil buffer. dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_pixel.* This issue is known to affect both Mali-G52 (v7) and Mali-G57 (v9). I am unsure if this is a silicon bug or just an obscure implementation detail. No shader-db changes. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>	2022-07-13 21:05:35 +00:00
Jason Ekstrand	1b3777ee0f	panfrost: Simplify sample_shading Nos that glsl_to_nir is setting sample_shading_enable whenever FB fetch is used, we don't need to duplicate it here. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>	2022-07-13 20:28:42 +00:00
Alyssa Rosenzweig	cc980ee0ed	panfrost: Protect pandecode by a mutex Pandecode is not thread-safe (for a large number of reasons) and does not even try to be. This is a problem when tracing (or just using PAN_MESA_DEBUG=sync) multithreaded applications. The most common symptom of the problem are assertion failures deep in the red-black tree implementation, which is not thread-safe. Just protect the whole thing by a "in pandecode?" mutex, since this is not performance sensitive code and we don't really care about the extra serialization incurred. As pandecode does not recurse into itself, we may simply lock at the beginning and unlock at the end of each entrypoint in pandecode, which is thread-safe regardless of how pandecode is used. A few entrypoints are refactored to avoid early returns to keep the lock/unlock calls in obvious visual pairs. Fixes flakes when running the CL CTS with PAN_MESA_DEBUG=sync like we would in CI (e.g: events.event_flush) Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Tested-by: Icecream95 <ixn@disroot.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17409>	2022-07-13 19:15:13 +00:00
Alyssa Rosenzweig	96d65b47c7	panfrost: Use implementation-specific tile size The physical tile buffer size (and hence the maximum available tilebuffer size) are implementation-defined. Track this information on the device so we can correctly select tile sizes, instead of hardcoding the value for Midgard. Implementation values are pulled from the "Tile bits/pixel" row of the public Mali data sheet [1]. That row lists the maximum number of bits available for a pixel given the maximum tile size and pipelining. For currently supported hardware (v9 and older), that maximum tile size is 16x16. So those values should be multiplied by (16 * 16 * 2) / 8 to get the physical size in bytes. This may improve Bifrost/Valhall performance on workloads using multiple render targets. It also gets us ready for the dazzling array of tile sizes available with v10. [1] https://developer.arm.com/documentation/102849/latest/ Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17432>	2022-07-13 19:00:41 +00:00
Alyssa Rosenzweig	d67681c4ea	panfrost: Make pan_select_max_tile_size O(1) Separate out "calculating the size of each pixel", "selecting a tile size", and "calculating the colour buffer allocation". Then implement the middle (selecting a tile size) with a simple constant time expression, rather than a loop. There's a bit of related clean up in here. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17432>	2022-07-13 19:00:41 +00:00
Alyssa Rosenzweig	d458384883	pan/va: Handle BIFROST_MESA_DEBUG=nosb For debugging flakes that might be caused due to wrong scoreboarding. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17223>	2022-07-13 18:46:15 +00:00
Alyssa Rosenzweig	2171412c66	pan/va: Print instructions with pack assert fails va_pack asserts a large number of invariants about the instruction being packed. If one of these fails (due to an invalid instruction), it's helpful to inspect the failing instruction, as it may not be apparent in a large shader. Pass the instruction through with all the assertions in va_pack for easier debugging. Now assertion failures when packing are easier to debug: Invalid invariant lo.type == hi.type: = STORE.i32.flow2.wls br5, br2, wls_ptr[1], byte_offset:0 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>	2022-07-13 18:01:56 +00:00
Alyssa Rosenzweig	cfeafef755	pan/va: Use invalid_instruction in more places By passing the instruction pointer through the packer, we can get better error messages with invalid_instruction. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>	2022-07-13 18:01:56 +00:00
Alyssa Rosenzweig	cc94409d70	pan/va: Dump unencodable instructions When we assert out due to certain invalid encoding, it's helpful to know what instruction is causing the failure, since it may not be obvious from the assembly for large shaders. Now we get nice errors when failing: Invalid opcode: br0 = VAR_TEX.f32.flow8.store.skip.lod_mode.center , texture_index:0, varying_index:0 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>	2022-07-13 18:01:56 +00:00
Adam Jackson	768238fdc0	glx: Fix drawable refcounting for naked Windows driFetchDrawable is only ever called from the MakeCurrent path, which means it has to handle the case of pre-GLX-1.3 Windows being named as the drawable. When it finds the drawable in the hash, it increments its refcount before returning it, so for a GLXWindow it would be 2 on first return, one from glXCreateWindow and one from glXMakeCurrent. But when it does not find the drawable and creates one for the naked Window, the reference count on first return would only be 1. As a result, if this context was then ever bound to a different drawable, the old Window's DRI drawable state (like the back buffer) would be destroyed. Fixes piglit's glx-multi-window-single-context and glx-make-current for a variety of drivers. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6713 Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17479>	2022-07-13 12:25:30 -04:00
Eric Engestrom	8fa577340c	panvk: use updated tokens from vk.xml Signed-off-by: Eric Engestrom <eric@igalia.com> Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17342>	2022-07-12 15:53:11 +00:00
Alyssa Rosenzweig	e0e2294f47	panfrost/ci: Disable T760 jobs These keep timing out due to abusive jobs. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17433>	2022-07-08 21:33:19 +00:00
Alyssa Rosenzweig	35a80418a1	panfrost/ci: Disable 0ad trace on T860 The last few frames of the trace are expensive (in terms of GPU time) and are close to hitting the timeout. With the next commit, they do hit the timeout due to using a larger batch. Nevertheless the next commit should be an overall perf improvement on average, so this remove to unblock CI. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Suggested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17112>	2022-07-08 01:50:03 +00:00
Icecream95	91d9a34925	pan/decode: Change indent when decoding resources Make the separation between entries in the resource table more obvious. Increase the indent by two levels to keep descriptors distinct from the resource entry itself. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>	2022-07-08 01:30:23 +00:00
Icecream95	e05889c8c9	pan/decode: Use tag bits for resource entry count Fixes crashes when decoding the blob, which sometimes uses fewer than 9 entries. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>	2022-07-08 01:30:23 +00:00
Icecream95	f7da4eade4	pan/decode: fflush buffers after dumping and before aborts Otherwise trace files or other files being written (dEQP TestResults?) might be truncated. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>	2022-07-08 01:30:23 +00:00
Icecream95	bcd85a74e8	pan/va: Use the _safe iterator when adding blend shader calls Otherwise the list 'next' changing will cause the assertion in list_for_each_entry to be hit. This was not hit before because list_assert is defined for debug builds but not debugoptimized. Fixes: `5067a26f44` ("pan/bi: Use flow control lowering on Valhall") Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>	2022-07-08 01:30:23 +00:00
Alyssa Rosenzweig	fbe430fae9	panfrost: Move bifrost_lanes_per_warp to common Whereas the compiler needs to know the warp size for lowering divergent indirects, the driver needs to know it to report the subgroup size. Move the Bifrost-specific helper to common and add the trivial implementation for Midgard. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>	2022-07-08 01:14:55 +00:00
Alyssa Rosenzweig	6f3eea5ddb	panfrost: Separate core ID range from core count To query the core count, the hardware has a SHADERS_PRESENT register containing a mask of shader cores connected. The core count equals the number of 1-bits, regardless of placement. This value is useful for public consumption (like in clinfo). However, internally we are interested in the range of core IDs. We usually query core count to determine how many cores to allocate various per-core buffers for (performance counters, occlusion queries, and the stack). In each case, the hardware writes at the index of its core ID, so we have to allocate enough for entire range of core IDs. If the core mask is discontiguous, this necessarily overallocates. Rename the existing core_count to core_id_range, better reflecting its definition and purpose, and repurpose core_count for the actual core count. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>	2022-07-08 01:14:55 +00:00
Alyssa Rosenzweig	5aa740bc8e	pan/bi: Implement f2f16{_rtz, _rtne} Float conversions with explicit rounding modes are required for OpenCL, as well as for Vulkan with the VK_KHR_16bit_storage extension (mandatory in Vulkan 1.1). Since the hardware conversion instructions allow configuring the round mode, this is easy to support :-) Fixes test_half.vstore_half_rtz. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17262>	2022-07-08 00:57:18 +00:00
Alyssa Rosenzweig	5f599fdef6	pan/va: Add missing <roundmode/> to V2F32_TO_V2F16 So we can implement f2f16_rtz. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17262>	2022-07-08 00:57:18 +00:00
Alyssa Rosenzweig	9bd7570e96	pan/bi: Fix unpack_32_2x16 definition This got messed up when scalarizing the IR. Fix the definition of the opcode to return (instead of break, asserting out) and to respect the swizzle (instead of failing validation). Noticed when bringing up OpenCL on Valhall. Fixes: `5febeae58e` ("pan/bi: Emit collect and split") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17222>	2022-07-07 23:16:39 +00:00
Jason Ekstrand	642283a2c1	panfrost,asahi: Use util_sign_extend for unpacking Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17214>	2022-07-06 11:23:18 +00:00
Alyssa Rosenzweig	154929d731	pan/va: Handle terminal barriers If a shader ends with a workgroup barrier, it must wait for slot #7 at the end to finish the barrier. After inserting flow control, we get: BARRIER NOP.wait NOP.end Currently, the flow control pass assumes that .end implies all other control flow, and will merge this down to BARRIER.end However, this is incorrect. Slot #7 is no longer waited on. In theory, this cannot affect the correctness of the shader. In practice, the hardware checks that all barriers are reached. Terminating without waiting on slot #7 first raises an INSTR_BARRIER_FAULT. We need to weaken the flow control merging slightly to avoid this incorrect merge, instead emitting: BARRIER.wait NOP.end Of course, all of these cases are inefficient: terminal barriers shouldn't be emitted in the first place. I wrote out an optimization for this. We can merge it if we find a workload that it actually helps. Fixes test_half.vstore_half. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17264>	2022-07-05 14:48:09 +00:00
Jason Ekstrand	d06335ed76	vulkan: Depend on vk_pipeline_layout in vk_cmd_enqueue Now that we have a common pipeline layout with reference counting, we don't need these driver hooks for reference counting anymore. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17286>	2022-06-29 20:31:58 +00:00
Jason Ekstrand	73eecffabd	panvk: Use the vk_pipeline_layout base struct Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17286>	2022-06-29 20:31:58 +00:00
Jason Ekstrand	f66f37a99e	panvk: Use the vk_descriptor_set_layout base struct Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17286>	2022-06-29 20:31:58 +00:00
David Heidelberg	6c4cc0abc6	ci: traces: switch to brotli compressed traces virgl: Also drop old pre-trim glxgears trace (cached). Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Emma Anholt <emma@anholt.net> Signed-off-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17280>	2022-06-29 00:58:28 +00:00
Jason Ekstrand	b8882718b7	panvk: Use the new border color helpers Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15359>	2022-06-23 00:01:41 +00:00
Alyssa Rosenzweig	3fedf22b60	pan/bi: Tune lower_vars_to_scratch Increase the threshold to lower indirect indexing of arrays to scratch memory all the way up to 256 bytes, which was the lowest power-of-two threshold for which enabling the pass on Mali-G57 was a win in shaderdb. It's difficult to tell what threshold is optimal here. The shader-db stats are based on a rough cycle model that assumes a 16:1 ratio between CVT and load/store on Valhall, and a 24:1 ratio between arithmetic and load/store on Bifrost. Those ratios are at most rules of thumb, as the number of cycles required by a load/store instruction will vary tremendously based on caching and the memory controller. However, they may well be lower bounds (if those are the upper bounds on instruction issuing in the Mali shader cores). As such, a large threshold seems well motivated. shader-db results on Mali-G52 follow, results on Mali-G57 were similar. Note the shader that's hurt for spills/fills is helped for load/store overall. cycles helped: 129 -> 98 (-24.03%) (spills: 17 -> 20 (17.65%); fills: 34 -> 40 (17.65%)) ldst helped: 129 -> 98 (-24.03%) (spills: 17 -> 20 (17.65%); fills: 34 -> 40 (17.65%)) total instructions in shared programs: 2415410 -> 2415372 (<.01%) instructions in affected programs: 1041 -> 1003 (-3.65%) helped: 3 HURT: 0 helped stats (abs) min: 2.0 max: 31.0 x̄: 12.67 x̃: 5 helped stats (rel) min: 2.08% max: 6.02% x̄: 3.90% x̃: 3.60% total tuples in shared programs: 1928558 -> 1928527 (<.01%) tuples in affected programs: 826 -> 795 (-3.75%) helped: 2 HURT: 1 helped stats (abs) min: 6.0 max: 26.0 x̄: 16.00 x̃: 16 helped stats (rel) min: 3.72% max: 9.68% x̄: 6.70% x̃: 6.70% HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.54% max: 1.54% x̄: 1.54% x̃: 1.54% total clauses in shared programs: 355013 -> 354981 (<.01%) clauses in affected programs: 220 -> 188 (-14.55%) helped: 3 HURT: 0 helped stats (abs) min: 2.0 max: 27.0 x̄: 10.67 x̃: 3 helped stats (rel) min: 13.99% max: 21.43% x̄: 16.93% x̃: 15.38% total cycles in shared programs: 166610.27 -> 166574.90 (-0.02%) cycles in affected programs: 138 -> 102.62 (-25.63%) helped: 3 HURT: 0 helped stats (abs) min: 0.4583330000000001 max: 31.0 x̄: 11.79 x̃: 3 helped stats (rel) min: 15.28% max: 65.28% x̄: 34.86% x̃: 24.03% total arith in shared programs: 73690.13 -> 73690.58 (<.01%) arith in affected programs: 29.71 -> 30.17 (1.54%) helped: 1 HURT: 2 helped stats (abs) min: 0.0833339999999998 max: 0.0833339999999998 x̄: 0.08 x̃: 0 helped stats (rel) min: 3.85% max: 3.85% x̄: 3.85% x̃: 3.85% HURT stats (abs) min: 0.125 max: 0.4166659999999993 x̄: 0.27 x̃: 0 HURT stats (rel) min: 1.66% max: 5.17% x̄: 3.42% x̃: 3.42% total ldst in shared programs: 135611 -> 135571 (-0.03%) ldst in affected programs: 138 -> 98 (-28.99%) helped: 3 HURT: 0 helped stats (abs) min: 3.0 max: 31.0 x̄: 13.33 x̃: 6 helped stats (rel) min: 24.03% max: 100.00% x̄: 74.68% x̃: 100.00% total quadwords in shared programs: 1674599 -> 1674523 (<.01%) quadwords in affected programs: 838 -> 762 (-9.07%) helped: 3 HURT: 0 helped stats (abs) min: 2.0 max: 65.0 x̄: 25.33 x̃: 9 helped stats (rel) min: 3.39% max: 15.00% x̄: 9.14% x̃: 9.04% total spills in shared programs: 37 -> 40 (8.11%) spills in affected programs: 17 -> 20 (17.65%) helped: 0 HURT: 1 total fills in shared programs: 190 -> 196 (3.16%) fills in affected programs: 34 -> 40 (17.65%) helped: 0 HURT: 1 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	fd021a618f	pan/va: Replace MKVEC.v4i8 with MKVEC.v2i8 This is the instruction that the hardware actually supports. Do the rename, use the more specific accurate model in the IR, and rework the Valhall texturing code to emit MKVEC.v2i8 instead of MKVEC.v4i8. Will fix: dEQP-GLES31.functional.texture.gather.offset_dynamic.implementation_offset.* Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	c570693c19	pan/va: Pack MKVEC.v2i8 byte lanes They are in a different place, but the encoding is otherwise as usual. This will be required for texture gathers with dynamic offsets. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	10301885ab	pan/bi: Constant fold MKVEC.v2i8 Constant MKVEC.v2i8 will be generated during texturing on Valhall, just like constant MKVEC.v4i8 is currently generated. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	2833d0472a	pan/bi: Model MKVEC.v2i8 Valhall does not have Bifrost's 4-source MKVEC.v4i8. Instead, it has a (somewhat limtied) 3-source MKVEC.v2i8. The full MKVEC.v4i8 may be lowered to a pair of MKVEC.v2i8 instructions. For good code quality on both Bifrost and Valhall, we need to model both instructions in their full generality. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	6792b15971	pan/bi: Remove FRSCALE from IR It's just LDEXP in different clothing. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	21bedd2c97	pan/va: Rename RSCALE to LDEXP This avoids needless variation from Bifrost. While at it, fix the opcode definition: there are no abs/neg/swizzle modifiers on the signed integer source, and there's no clamp. However, there are round and infinity modes, like on Bifrost. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	0da28ee2c7	pan/va: Implement sample positions FAU packing This will fix: dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.at_sample_position.default_framebuffer Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	9dd0bc92b5	pan/va: Lower FADD_RSCALE.f32 to FMA_RSCALE.f32 We generate FADD_RSCALE.f32 in our sample variables implementations. Valhall doesn't have a dedicated FADD_RSCALE.f32 implementation, it should be aliased to FMA_RSCALE.f32. Handle that alias in isel lowering. This will fix: dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.* Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	1a882ecdab	pan/bi: Align accesses with packed TLS When lowering vars to scratch, we need to be careful with alignment on Valhall, where packed TLS access must not straddle a 16-byte boundary. Fixes regressions when enabling indirect access to temps on Valhall. Fixes: `6761dbf891` ("panfrost: Use packed TLS on Valhall") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	5ee1179c94	pan/bi: Fix LD_BUFFER.i16 definition This was missing the message, breaking UBO-to-push and who-knows-what-else, when enabling fp16 const buffers. Fixes: `3dc2095b07` ("pan/bi: Model LD_BUFFER instructions") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>	2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig	40accfd3b7	pan/va: Unit test va_mark_last This pass is super easy to unit test, so we have no excuse not to test thoroughly. va_mark_last only inserts annotations in a shader without any annotations, so our test cases are simply annotated shaders. The CASE macro just has to compare the case against the case with the annotations stripped and added back with va_mark_last. In retrospect, I should have used that technique for the flow control insertion tests too. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>	2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig	4b7e337b45	pan/va: Mark last register reads On Valhall, register reads may be marked as "last" [1]. Setting the last flag promises the hardware that the value of the register is no longer required. This may enable hardware optimizations. In particular, it may permit the hardware to avoid register file writes if a write to the marked register is still in the forwarding buffer. This may improve power efficiency. In principle, this is trivial: run liveness analysis and mark killed sources, like we would in an SSA-based register allocator. In practice, there are a few wrinkles to avoid hazards around staging registers and 64-bit register pairs, requiring some additional data flow analysis and fix ups. However, nothing here is particularly "hard", and all the ideas are already in use for the Bifrost scheduler and the Bifrost/Valhall scoreboard analyses. [1] In Mesa's compiler, this is called discard for historical reasons. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>	2022-06-21 22:19:59 +00:00

1 2 3 4 5 ...

4240 Commits