KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Icecream95	ec70291da9	panfrost: Stop using sparse_array for batch BOs Iterating over a util_sparse_array is very expensive; replace this with a standard dynarray. Using the sparse 'nodearray' datastructure instead was tested, but found to be slower in some cases. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16988>	2022-06-14 23:44:02 +00:00
David Heidelberg	96f0944a69	ci/panfrost: add Blender, Warzone2100, Freedoom and Unvanquished traces Signed-off-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16909>	2022-06-14 11:52:45 +00:00
Alyssa Rosenzweig	9bdd0854ea	panvk: Use common CmdBeginRenderPass The runtime already handles this. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>	2022-06-13 17:27:36 +00:00
Alyssa Rosenzweig	f00e0bfd8a	panvk: Simplify depth clear preload condition Easier to understand and equivalent in practice. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>	2022-06-13 17:27:36 +00:00
Alyssa Rosenzweig	ec2bf34d97	panvk: Fix stencil clears of combined ZS images If we have a combined Z/S image, the image has depth, so we proceed down the depth path, which does not set clear.s even though there's also a stencil component. Unify the control flow to fix this. Fixes (among others): dEQP-VK.api.image_clearing.core.clear_depth_stencil_image.single_layer.d24_unorm_s8_uint_multiple_subresourcerange Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>	2022-06-13 17:27:36 +00:00
Alyssa Rosenzweig	588ee38351	panvk: Clear Z/S attachments without a shader Rather than generating shaders to clear depth and stencil attachments, run the rasterizer without a shader and configure the depth/stencil hardware to do the clear. These settings are known to be efficient on Valhall, presumably the depth/stencil pipeline on Bifrost is similar enough that it is also the efficient way there. It's certainly much simpler. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>	2022-06-13 17:27:36 +00:00
Alyssa Rosenzweig	03c34a8887	panvk: Remove unused pushmaps These were removed in an earlier series containing `ae77c207e0` ("panvk: Use push constants for copy shaders"), but the unused variables hung around. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>	2022-06-13 17:27:36 +00:00
Alyssa Rosenzweig	2136643a51	panvk: Don't specialize clear shaders for RT On Bifrost and newer, blend descriptors are decoupled from render target. That means we can always use a clear shader reading from blend_descriptor_0 and specify the desired render target in the sole blend descriptor we pass. Likewise on Bifrost and newer we don't need blend descriptors when we don't blend, which is the case for the Z/S clears. This reduces the number of shaders compiled on startup from 468 to 426. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>	2022-06-13 17:27:36 +00:00
Alyssa Rosenzweig	44223e5f28	panfrost: Disable CRC at <16x16 tile sizes The hardware writes one CRC per (effective) tile, the tile size of the CRC buffer is the same as the configured effective tile size. However, all our CRC infrastructure assumes 16x16 tiles. In case CRC is used with smaller tiles, buffer overflows and incorrect rendering are all possible. Don't use CRC at smaller tile sizes. Note disabling CRC correctly invalidates any bound CRC buffers. Fixes: `2e97d7c835` ("panfrost: Transaction elimination support") Closes: #6332 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>	2022-06-13 15:46:12 +00:00
Alyssa Rosenzweig	cac0578ee5	panfrost: Inline pan_fbd_has_zs_crc_ext It has a single user -- in a section of code that only runs for MFBD GPUs and that has already decided whether to use CRCs -- so inlining it simplifies its definition greatly and may avoid redeciding the CRC setting. [Note for mesa-stable maintainers: This is not a bug fix but is marked for backport so the next patch applies cleanly.] Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>	2022-06-13 15:46:12 +00:00
Denis Pauk	79b88852c8	panvk: Return VK_ERROR_INCOMPATIBLE_DRIVER for Midgard Midgard is unsupported after merge of !16915 Signed-off-by: Denis Pauk <pauk.denis@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16991>	2022-06-13 14:44:16 +00:00
Alyssa Rosenzweig	c43882ad54	panfrost: Allow pixels using discard to be killed info.fs.sidefx considers discard() to be a side effect. That definition is... dubious at best. It certainly isn't the definition needed for forward pixel kill. The only reason pixels couldn't be killed by FPK is if the shader has side effects in the sense of writing to memory. Use that more precise condition so FPK works more often. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Closes: #5607 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16984>	2022-06-13 14:23:55 +00:00
Jason Ekstrand	a09e08ae95	panvk: Use the common AcquireNextImage implementation The only reason for the wrapper was so that we could dummy signal the semaphore and fence. Now that the WSI code always dos this for us, we can drop our wrapper. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>	2022-06-10 01:33:12 +00:00
Alyssa Rosenzweig	c0d6539827	panvk: Drop support for Midgard We've discussed this at length and have agreed that Midgard + Vulkan is DOA, but have let the code linger. Now it's getting in the way of forward progress for PanVK... That means it's time to drop the code paths and commit t to not supporting it. Midgard is only barely Vulkan 1.0 capable, Arm's driver was mainly experimental. Today, there are no known workloads today for hardware of that class, given the relatively weak CPU and GPU, Linux, and arm64. Even with a perfect Vulkan driver, FEX + DXVK on RK3399 won't be performant. There is a risk here: in the future, 2D workloads (like desktop compositors) might hard depend on Vulkan. It seems this is bound to happen but about a decade out. I worry about contributing to hardware obsolescence due to missing Vulkan drivers, however such a change would obsolete far more than Midgard v5... There's plenty of GL2 hardware that's still alive and well, for one. It doesn't look like Utgard will be going anywhere, even then. For the record: I think depending on Vulkan for 2D workloads is a bad idea. It's unfortunately on brand for some compositors. Getting conformant Vulkan 1.0 on Midgard would be a massive amount of work on top of conformant Bifrost/Valhall PanVK, and the performance would make it useless for interesting 3D workloads -- especially by 2025 standards. If there's a retrocomputing urge in the future to build a Midgard + Vulkan driver, that could happen later. But it would be a lot more work than reverting this commit. The compiler would need significant work to be appropriate for anything newer than OpenGL ES 3.0, even dEQP-GLES31 tortures it pretty bad. Support for non-32bit types is lacklustre. Piles of basic shader features in Vulkan 1.0 are missing or broken in the Midgard compiler. Even if you got everything working, basic extensions like subgroup ops are architecturally impossible to implement. On the core driver side, we would need support for indirect draws -- on Vulkan, stalling and doing it on the CPU is a nonoption. In fact, the indirect draw code is needed for plain indexed draws in Vulkan, meaning Zink + PanVK can be expected to have terrible performance on anything older than Valhall. (As far as workloads to justify building a Vulkan driver, Zink/ANGLE are the worst examples. The existing GL driver works well and is not much work to maintain. If it were, sticking it in Amber branch would still be less work than trying to build a competent Vulkan driver for that hardware.) Where does PanVK fit in? Android, for one. High end Valhall devices might run FEX + DXVK acceptably. For whatever it's worth, Valhall is the first Mali hardware that can support Vulkan properly, even Bifrost Vulkan is a slow mess that you wouldn't want to use for anything if you had another option. In theory Arm ships Vulkan drivers for this class of hardware. In practice, Arm's drivers have long sucked on Linux, assuming you could get your hands on a build. It didn't take much for Panfrost to win the Linux/Mali market. The highest end Midgard getting wide use with Panfrost is the RK3399 with the Mali-T860, as in the Pinebook Pro. Even by today's standards, RK3399 is showing its limits. It seems unlikely that its users in 10 years from now will also be using Vulkan-required 2030 desktop environment eye candy. Graphically, the nicest experience on RK3399 is sway or weston, with GLES2 renderers. Realistically, sway won't go Vulkan-only for a long-time. Making ourselves crazy trying to support Midgard poorly in PanVK seems like letting perfect (Vulkan support) be the enemy of good (Vulkan support). In that light, future developers making core 2D software Vulkan-only (forcing software rasterization instead of using the hardware OpenGL) are doing a lot more e-wasting than us simply not providing Midgard Vulkan drivers because we don't have the resources to do so, and keeping the broken code in-tree will just get in the way of forward progress for shipping PanVK at all. There are good reasons, after all, that turnip starts with a6xx. (If proper Vulkan support only began with Valhall, will we support Bifrost long term? Unclear. There are some good arguments on both sides here.) Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16915>	2022-06-08 18:43:06 +00:00
Alyssa Rosenzweig	10a2406232	pan/perf: Fix performance counters on G57 The performance counter layout depends on the number of L2 blocks and the number of shader cores. It doesn't make a ton of sense to hardcode these into the XML files. Instead, let's make the coutner offsets in the XML files relative to the categories (blocks), so we can calculate the offsets of the categories themselves at runtime based on the computed layout. This fixes performance counters on Mali-G57 as implemented on MT8192. There is little code change here, mainly churn from changing the XML definition. Postprocessing for the XML to make it suitable for Mesa uses Antonio Caggiano's https://gitlab.freedesktop.org/panfrost/hwc-helper tool. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16803>	2022-06-08 13:57:18 +00:00
Alyssa Rosenzweig	0ecbfcc892	panfrost: Add panfrost_query_l2_slices helper The number of L2 performance counter blocks equals the number of L2 slices, so add a query to get this. This information isn't needed by the Mesa driver, so don't get it in the default device initialization path. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16803>	2022-06-08 13:57:18 +00:00
Alyssa Rosenzweig	58b408611f	panfrost: Remove is_64b assignments These are redundant with GenXML defaults, they're just noise. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	ae4841c105	panfrost: Remove redundant first_tag access This already happens in the common prepare_rsd call. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	c999a9daa8	panfrost: Deduplicate indirect dispatch structs The input is specified in two identical structs, tear that apart. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	ae77c207e0	panvk: Use push constants for copy shaders Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	1a0217e3fb	panvk: Use push constants for clear Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	f227fb6da2	panfrost: Use push constants for indirect draws Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	43884a9b09	panfrost: Use push constants for indirect dispatch Much simpler than creating a UBO and relying on it getting optimized to a push constant, with possible reordering. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	90beea75f6	pan/bi: Don't reorder push with no_ubo_to_push Otherwise, load_push_constant won't work properly. This could probably be made to work if we tried hard enough, but we still don't want reordering for internal (meta) shaders which are layed out deliberately. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	17ea1642e2	pan/bi: Implement load_push_constant Bifrost supports "fast access uniforms" loaded from a single contiguous buffer. This maps directly to Vulkan push constants, with some caveats: * No indirect access. Indirects need to be lowered to a UBO pull. * Strict alignment requirements. These will be met in practice. Implement the NIR intrinsic and map it to the native hardware construct. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Jason Ekstrand	420717b2ce	panvk: Use vk_image_buffer_copy_layout Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16873>	2022-06-07 17:57:42 +00:00
Emma Anholt	464b32c030	glsl: Drop the div-to-mul-rcp lowering for floats. NIR has fdiv, and all the NIR backends have to have lower_fdiv set appropriately already since various passes (format conversions, tgsi_to_nir, nir_fast_normalize(), etc.) might generate one. This causes softpipe and llvmpipe to now do actual divides, since lower_fdiv is not set there. Note that llvmpipe's rcp implementation is a divide of 1.0 by x, so now we're going to be just doing div(x, y) instead of mul(x, div(1.0, y)). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>	2022-06-07 02:38:42 +00:00
Alyssa Rosenzweig	feb9020039	panfrost: Enable Mali-G57 Everything required for conformant OpenGL ES 3.1 support on Valhall (v9) is now upstream -- all that's left is to enable implementations! Add the GPU ID for the Mali-G57 implemented in the MediaTek MT8192 system-on-chip. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16890>	2022-06-06 19:30:15 +00:00
Alyssa Rosenzweig	28801cfba0	pan/va: Unit test constant lowering pass Like other optimizations, breaking this pass may not affect functional correctness. It's also dead simple to unit test the pass, so we have no excuse not to. Add unit tests for the functionality we currently support, since we just extended it and want to make sure everything still works. This includes tests for use of modifiers to get more small constants. There are lots of subtle gotchas there, so let's add lots of unit tests to make sure we got it right. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>	2022-06-06 18:10:24 +00:00
Alyssa Rosenzweig	9cfafbb09b	pan/va: Try widening small constants Many small integers are availabled as small constants, but the table of small constants is tightly packed. Zero and sign extensions are usually required to access small integers. When packing constants, try zero/sign extension for unsigned/signed integer instructions respectively. total instructions in shared programs: 2716912 -> 2707795 (-0.34%) instructions in affected programs: 1045609 -> 1036492 (-0.87%) helped: 4460 HURT: 125 helped stats (abs) min: 1.0 max: 58.0 x̄: 2.14 x̃: 1 helped stats (rel) min: 0.14% max: 23.85% x̄: 1.35% x̃: 0.88% HURT stats (abs) min: 1.0 max: 68.0 x̄: 3.41 x̃: 1 HURT stats (rel) min: 0.34% max: 3.88% x̄: 0.93% x̃: 0.70% 95% mean confidence interval for instructions value: -2.09 -1.89 95% mean confidence interval for instructions %-change: -1.33% -1.25% Instructions are helped. total cycles in shared programs: 141984.06 -> 141932.42 (-0.04%) cycles in affected programs: 552.08 -> 500.44 (-9.35%) helped: 18 HURT: 0 helped stats (abs) min: 0.015625 max: 11.0 x̄: 2.87 x̃: 0 helped stats (rel) min: 0.50% max: 19.64% x̄: 5.36% x̃: 1.53% 95% mean confidence interval for cycles value: -5.17 -0.56 95% mean confidence interval for cycles %-change: -9.28% -1.44% Cycles are helped. total cvt in shared programs: 13805.05 -> 13663.39 (-1.03%) cvt in affected programs: 6127.45 -> 5985.80 (-2.31%) helped: 4460 HURT: 125 helped stats (abs) min: 0.015625 max: 0.90625 x̄: 0.03 x̃: 0 helped stats (rel) min: 0.35% max: 50.00% x̄: 5.19% x̃: 4.00% HURT stats (abs) min: 0.015625 max: 1.0625 x̄: 0.05 x̃: 0 HURT stats (rel) min: 0.77% max: 9.30% x̄: 3.40% x̃: 2.78% 95% mean confidence interval for cvt value: -0.03 -0.03 95% mean confidence interval for cvt %-change: -5.10% -4.81% Cvt are helped. total ls in shared programs: 129545 -> 129494 (-0.04%) ls in affected programs: 495 -> 444 (-10.30%) helped: 6 HURT: 0 helped stats (abs) min: 2.0 max: 11.0 x̄: 8.50 x̃: 11 helped stats (rel) min: 1.49% max: 19.64% x̄: 13.95% x̃: 19.64% 95% mean confidence interval for ls value: -12.68 -4.32 95% mean confidence interval for ls %-change: -23.23% -4.67% Ls are helped. total quadwords in shared programs: 1476416 -> 1469824 (-0.45%) quadwords in affected programs: 121208 -> 114616 (-5.44%) helped: 820 HURT: 16 helped stats (abs) min: 8.0 max: 32.0 x̄: 8.28 x̃: 8 helped stats (rel) min: 1.39% max: 50.00% x̄: 11.00% x̃: 10.00% HURT stats (abs) min: 8.0 max: 32.0 x̄: 12.50 x̃: 8 HURT stats (rel) min: 1.38% max: 10.00% x̄: 6.19% x̃: 7.14% 95% mean confidence interval for quadwords value: -8.14 -7.63 95% mean confidence interval for quadwords %-change: -11.20% -10.15% Quadwords are helped. total threads in shared programs: 53633 -> 53663 (0.06%) threads in affected programs: 39 -> 69 (76.92%) helped: 33 HURT: 3 helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 HURT stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for threads value: 0.64 1.02 95% mean confidence interval for threads %-change: 73.27% 101.73% Threads are helped. total spills in shared programs: 154 -> 103 (-33.12%) spills in affected programs: 75 -> 24 (-68.00%) helped: 6 HURT: 0 total fills in shared programs: 656 -> 656 (0.00%) fills in affected programs: 148 -> 148 (0.00%) helped: 2 HURT: 4 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>	2022-06-06 18:10:23 +00:00
Alyssa Rosenzweig	72146051d5	pan/va: Try negating small constants when lowering If a constant is used with a floating point instruction with a floating-point negate modifier, we can use the modifier to negate constants in the table for free. Each floating point in the table is positive, so this is required for negative small constants. total instructions in shared programs: 2728438 -> 2716912 (-0.42%) instructions in affected programs: 1418220 -> 1406694 (-0.81%) helped: 6053 HURT: 94 helped stats (abs) min: 1.0 max: 43.0 x̄: 1.94 x̃: 1 helped stats (rel) min: 0.06% max: 18.18% x̄: 1.34% x̃: 0.84% HURT stats (abs) min: 1.0 max: 5.0 x̄: 2.34 x̃: 2 HURT stats (rel) min: 0.09% max: 21.43% x̄: 1.87% x̃: 0.91% 95% mean confidence interval for instructions value: -1.93 -1.82 95% mean confidence interval for instructions %-change: -1.34% -1.25% Instructions are helped. total cycles in shared programs: 142103 -> 141984.06 (-0.08%) cycles in affected programs: 766.70 -> 647.77 (-15.51%) helped: 97 HURT: 0 helped stats (abs) min: 0.015625 max: 40.0 x̄: 1.23 x̃: 0 helped stats (rel) min: 0.27% max: 41.24% x̄: 3.63% x̃: 2.08% 95% mean confidence interval for cycles value: -2.41 -0.04 95% mean confidence interval for cycles %-change: -4.68% -2.57% Cycles are helped. total cvt in shared programs: 13983.34 -> 13805.05 (-1.28%) cvt in affected programs: 7952.45 -> 7774.16 (-2.24%) helped: 6049 HURT: 98 helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.03 x̃: 0 helped stats (rel) min: 0.25% max: 100.00% x̄: 4.74% x̃: 2.52% HURT stats (abs) min: 0.015625 max: 0.078125 x̄: 0.04 x̃: 0 HURT stats (rel) min: 0.17% max: 100.00% x̄: 5.48% x̃: 2.54% 95% mean confidence interval for cvt value: -0.03 -0.03 95% mean confidence interval for cvt %-change: -4.83% -4.32% Cvt are helped. total ls in shared programs: 129660 -> 129545 (-0.09%) ls in affected programs: 601 -> 486 (-19.13%) helped: 7 HURT: 0 helped stats (abs) min: 3.0 max: 40.0 x̄: 16.43 x̃: 8 helped stats (rel) min: 2.88% max: 41.24% x̄: 17.41% x̃: 12.50% 95% mean confidence interval for ls value: -31.42 -1.44 95% mean confidence interval for ls %-change: -29.25% -5.58% Ls are helped. total quadwords in shared programs: 1482728 -> 1476416 (-0.43%) quadwords in affected programs: 131200 -> 124888 (-4.81%) helped: 798 HURT: 15 helped stats (abs) min: 8.0 max: 24.0 x̄: 8.06 x̃: 8 helped stats (rel) min: 0.34% max: 50.00% x̄: 10.15% x̃: 6.67% HURT stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8 HURT stats (rel) min: 1.49% max: 100.00% x̄: 11.25% x̃: 2.78% 95% mean confidence interval for quadwords value: -7.92 -7.60 95% mean confidence interval for quadwords %-change: -10.52% -8.99% Quadwords are helped. total threads in shared programs: 53585 -> 53633 (0.09%) threads in affected programs: 51 -> 99 (94.12%) helped: 49 HURT: 1 helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 HURT stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for threads value: 0.88 1.04 95% mean confidence interval for threads %-change: 90.97% 103.03% Threads are helped. total spills in shared programs: 125 -> 154 (23.20%) spills in affected programs: 75 -> 104 (38.67%) helped: 3 HURT: 4 total fills in shared programs: 800 -> 656 (-18.00%) fills in affected programs: 476 -> 332 (-30.25%) helped: 7 HURT: 0 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>	2022-06-06 18:10:23 +00:00
Alyssa Rosenzweig	cecfa0c44a	pan/va: Record which instructions are signed We need to distinguish signed integer instructions from unsigned integer instructions, to distinguish sign-extension and zero-extension of sources. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>	2022-06-06 18:10:23 +00:00
Alyssa Rosenzweig	e57dfed419	pan/bi: Implement b2i with MUX The result_type modifier propagation looks for MUX instructions, so using this canonical b2i implementation allows the sequence b2i(cmp) to be fused. It's also faster on its own: on Valhall, MUX may be implemented as CSEL on the CVT unit, while AND may only be implemented on the SFU unit. So in case this doesn't get fused, we expect 4x better throughput for b2i with this implementation. Similarly, on Bifrost, MUX may be scheduled to either unit (as CSEL on FMA or MUX on ADD), whereas AND may only be scheduled to FMA. Results on Mali-G52: total instructions in shared programs: 2419171 -> 2414814 (-0.18%) instructions in affected programs: 272203 -> 267846 (-1.60%) helped: 767 HURT: 0 helped stats (abs) min: 1.0 max: 138.0 x̄: 5.68 x̃: 2 helped stats (rel) min: 0.12% max: 15.57% x̄: 2.09% x̃: 0.68% 95% mean confidence interval for instructions value: -6.68 -4.68 95% mean confidence interval for instructions %-change: -2.37% -1.82% Instructions are helped. total tuples in shared programs: 1932822 -> 1929234 (-0.19%) tuples in affected programs: 76485 -> 72897 (-4.69%) helped: 380 HURT: 3 helped stats (abs) min: 1.0 max: 138.0 x̄: 9.46 x̃: 1 helped stats (rel) min: 0.14% max: 15.96% x̄: 3.81% x̃: 0.92% HURT stats (abs) min: 1.0 max: 6.0 x̄: 2.67 x̃: 1 HURT stats (rel) min: 0.38% max: 8.57% x̄: 3.80% x̃: 2.44% 95% mean confidence interval for tuples value: -11.30 -7.44 95% mean confidence interval for tuples %-change: -4.27% -3.22% Tuples are helped. total clauses in shared programs: 356094 -> 355992 (-0.03%) clauses in affected programs: 3264 -> 3162 (-3.12%) helped: 80 HURT: 0 helped stats (abs) min: 1.0 max: 9.0 x̄: 1.27 x̃: 1 helped stats (rel) min: 0.81% max: 50.00% x̄: 4.83% x̃: 3.39% 95% mean confidence interval for clauses value: -1.49 -1.06 95% mean confidence interval for clauses %-change: -6.23% -3.43% Clauses are helped. total cycles in shared programs: 167337.10 -> 167329.19 (<.01%) cycles in affected programs: 510.08 -> 502.17 (-1.55%) helped: 80 HURT: 2 helped stats (abs) min: 0.041665999999999315 max: 0.7916659999999993 x̄: 0.10 x̃: 0 helped stats (rel) min: 0.51% max: 13.64% x̄: 2.12% x̃: 1.34% HURT stats (abs) min: 0.041665999999999315 max: 0.0416669999999999 x̄: 0.04 x̃: 0 HURT stats (rel) min: 0.39% max: 2.78% x̄: 1.58% x̃: 1.58% 95% mean confidence interval for cycles value: -0.12 -0.07 95% mean confidence interval for cycles %-change: -2.59% -1.48% Cycles are helped. total arith in shared programs: 73819.54 -> 73669.25 (-0.20%) arith in affected programs: 2840.54 -> 2690.25 (-5.29%) helped: 383 HURT: 3 helped stats (abs) min: 0.041665999999999315 max: 5.75 x̄: 0.39 x̃: 0 helped stats (rel) min: 0.33% max: 18.81% x̄: 4.39% x̃: 0.98% HURT stats (abs) min: 0.041665999999999315 max: 0.25 x̄: 0.11 x̃: 0 HURT stats (rel) min: 0.39% max: 8.96% x̄: 4.04% x̃: 2.78% 95% mean confidence interval for arith value: -0.47 -0.31 95% mean confidence interval for arith %-change: -4.93% -3.71% Arith are helped. total quadwords in shared programs: 1679798 -> 1676259 (-0.21%) quadwords in affected programs: 72826 -> 69287 (-4.86%) helped: 381 HURT: 15 helped stats (abs) min: 1.0 max: 142.0 x̄: 9.35 x̃: 1 helped stats (rel) min: 0.25% max: 18.87% x̄: 4.33% x̃: 1.13% HURT stats (abs) min: 1.0 max: 6.0 x̄: 1.47 x̃: 1 HURT stats (rel) min: 0.30% max: 6.25% x̄: 0.77% x̃: 0.35% 95% mean confidence interval for quadwords value: -10.76 -7.11 95% mean confidence interval for quadwords %-change: -4.71% -3.56% Quadwords are helped. Results on Mali-G57: total instructions in shared programs: 2704193 -> 2699317 (-0.18%) instructions in affected programs: 293366 -> 288490 (-1.66%) helped: 758 HURT: 5 helped stats (abs) min: 1.0 max: 151.0 x̄: 6.45 x̃: 2 helped stats (rel) min: 0.11% max: 22.22% x̄: 2.05% x̃: 0.64% HURT stats (abs) min: 1.0 max: 7.0 x̄: 2.20 x̃: 1 HURT stats (rel) min: 0.22% max: 1.69% x̄: 0.87% x̃: 1.08% 95% mean confidence interval for instructions value: -7.42 -5.36 95% mean confidence interval for instructions %-change: -2.27% -1.79% Instructions are helped. total cycles in shared programs: 141711.73 -> 141711.84 (<.01%) cycles in affected programs: 214.36 -> 214.47 (0.05%) helped: 4 HURT: 42 helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.20 x̃: 0 helped stats (rel) min: 1.85% max: 12.78% x̄: 9.12% x̃: 10.93% HURT stats (abs) min: 0.015625 max: 0.09375 x̄: 0.02 x̃: 0 HURT stats (rel) min: 0.17% max: 17.65% x̄: 0.84% x̃: 0.34% 95% mean confidence interval for cycles value: -0.02 0.03 95% mean confidence interval for cycles %-change: -1.23% 1.17% Inconclusive result (value mean confidence interval includes 0). total cvt in shared programs: 14479.14 -> 14474.19 (-0.03%) cvt in affected programs: 2877.05 -> 2872.09 (-0.17%) helped: 508 HURT: 209 helped stats (abs) min: 0.015625 max: 0.453125 x̄: 0.02 x̃: 0 helped stats (rel) min: 0.25% max: 16.67% x̄: 1.23% x̃: 0.37% HURT stats (abs) min: 0.015625 max: 0.296875 x̄: 0.03 x̃: 0 HURT stats (rel) min: 0.15% max: 18.18% x̄: 1.70% x̃: 0.34% 95% mean confidence interval for cvt value: -0.01 -0.00 95% mean confidence interval for cvt %-change: -0.57% -0.18% Cvt are helped. total sfu in shared programs: 7875.69 -> 7590.75 (-3.62%) sfu in affected programs: 1567.38 -> 1282.44 (-18.18%) helped: 906 HURT: 0 helped stats (abs) min: 0.0625 max: 8.625 x̄: 0.31 x̃: 0 helped stats (rel) min: 2.38% max: 100.00% x̄: 16.80% x̃: 5.63% 95% mean confidence interval for sfu value: -0.37 -0.26 95% mean confidence interval for sfu %-change: -18.43% -15.17% Sfu are helped. total quadwords in shared programs: 1468152 -> 1465800 (-0.16%) quadwords in affected programs: 37104 -> 34752 (-6.34%) helped: 161 HURT: 2 helped stats (abs) min: 8.0 max: 80.0 x̄: 14.71 x̃: 8 helped stats (rel) min: 1.67% max: 20.00% x̄: 8.05% x̃: 7.69% HURT stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8 HURT stats (rel) min: 3.57% max: 3.85% x̄: 3.71% x̃: 3.71% 95% mean confidence interval for quadwords value: -16.29 -12.57 95% mean confidence interval for quadwords %-change: -8.58% -7.22% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>	2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig	8f3b62f87e	pan/va: Add MUX lowering tests Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>	2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig	677a66b3eb	pan/va: Lower MUX to CSEL where possible CSEL executes on the conversion unit (CVT), while MUX executes on the special function unit (SFU). Throughput on CVT is 4x higher than SFU, so this is (almost) always an optimization. The "real" MUX is still used for unusual cases, like 8-bit and bitselect. Note that it's easier for us to use MUX everywhere for the IR. This is an easy fixup to get better codegen on Valhall without touching the core Bifrost code. shader-db is a bit of a toss up: register pressure and instruction count are hurt in some cases due to restrictions on FAU access. In particular, a shader that muxes between two uniforms needs an extra move due to extra constant (zero). However, in terms of throughput this is still a win: 2 CVT instructions (MOV + CSEL) have 2x throughput to 1 SFU instruction (MUX). The MOV has opportunities for CSE, but that can hurt pressure in turn. Overall, cycles are helped substantially. total instructions in shared programs: 2728438 -> 2731597 (0.12%) instructions in affected programs: 414391 -> 417550 (0.76%) helped: 87 HURT: 1063 helped stats (abs) min: 1.0 max: 6.0 x̄: 5.17 x̃: 6 helped stats (rel) min: 0.19% max: 15.79% x̄: 4.12% x̃: 4.11% HURT stats (abs) min: 1.0 max: 56.0 x̄: 3.40 x̃: 2 HURT stats (rel) min: 0.11% max: 23.43% x̄: 1.15% x̃: 0.63% 95% mean confidence interval for instructions value: 2.47 3.03 95% mean confidence interval for instructions %-change: 0.61% 0.90% Instructions are HURT. total cycles in shared programs: 142103 -> 142015.75 (-0.06%) cycles in affected programs: 1263.45 -> 1176.20 (-6.91%) helped: 281 HURT: 176 helped stats (abs) min: 0.015625 max: 2.234375 x̄: 0.50 x̃: 0 helped stats (rel) min: 0.71% max: 54.17% x̄: 16.93% x̃: 15.31% HURT stats (abs) min: 0.015625 max: 30.0 x̄: 0.30 x̃: 0 HURT stats (rel) min: 0.84% max: 120.00% x̄: 7.16% x̃: 5.00% 95% mean confidence interval for cycles value: -0.33 -0.05 95% mean confidence interval for cycles %-change: -9.08% -6.22% Cycles are helped. total cvt in shared programs: 13983.34 -> 14891.70 (6.50%) cvt in affected programs: 7498.36 -> 8406.72 (12.11%) helped: 71 HURT: 4711 helped stats (abs) min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0 helped stats (rel) min: 5.41% max: 40.00% x̄: 10.23% x̃: 9.30% HURT stats (abs) min: 0.015625 max: 2.640625 x̄: 0.19 x̃: 0 HURT stats (rel) min: 0.18% max: 141.18% x̄: 16.21% x̃: 9.52% 95% mean confidence interval for cvt value: 0.18 0.20 95% mean confidence interval for cvt %-change: 15.21% 16.42% Cvt are HURT. total sfu in shared programs: 11320.44 -> 7882.56 (-30.37%) sfu in affected programs: 7618.50 -> 4180.62 (-45.13%) helped: 4782 HURT: 0 helped stats (abs) min: 0.0625 max: 10.5625 x̄: 0.72 x̃: 0 helped stats (rel) min: 1.34% max: 100.00% x̄: 41.91% x̃: 37.50% 95% mean confidence interval for sfu value: -0.75 -0.68 95% mean confidence interval for sfu %-change: -42.68% -41.14% Sfu are helped. total ls in shared programs: 129660 -> 129690 (0.02%) ls in affected programs: 25 -> 55 (120.00%) helped: 0 HURT: 1 total quadwords in shared programs: 1482728 -> 1484128 (0.09%) quadwords in affected programs: 58624 -> 60024 (2.39%) helped: 24 HURT: 195 helped stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8 helped stats (rel) min: 3.70% max: 20.00% x̄: 10.34% x̃: 10.00% HURT stats (abs) min: 8.0 max: 24.0 x̄: 8.16 x̃: 8 HURT stats (rel) min: 1.41% max: 50.00% x̄: 4.84% x̃: 2.56% 95% mean confidence interval for quadwords value: 5.70 7.09 95% mean confidence interval for quadwords %-change: 2.22% 4.14% Quadwords are HURT. total spills in shared programs: 125 -> 127 (1.60%) spills in affected programs: 0 -> 2 helped: 0 HURT: 1 total fills in shared programs: 800 -> 828 (3.50%) fills in affected programs: 0 -> 28 helped: 0 HURT: 1 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>	2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig	3741606b25	pan/va: Implement more lanes Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>	2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig	1768afa5b9	pan/bi: Extract MUX to CSEL optimization It's portable, and useful to both Bifrost and Valhall, in the clause scheduler and in an instruction selection respectively. Move it from the Bifrost clause scheduler to common code so we can share the benefits. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>	2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig	e1fb182d90	pan/va: Do not insert NOPs into empty shaders It's unnecessary and breaks the empty shader optimizations. Noticed while inspecting a trace from dEQP-GLES3.functional.color_clear.masked_scissored_rgb, which does not produce any varyings other than gl_Position in its vertex shader and hence should omit the varying shader. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16868>	2022-06-06 14:28:59 +00:00
Alyssa Rosenzweig	3b3cd59fb8	panfrost: Launch transform feedback shaders We now have infrastructure in place to generate variants of vertex shaders specialized for transform feedback. All that's left is launching these compute-like kernels before the IDVS job, implementing both the transform feedback and the regular rasterization pipeline. This implements transform feedback on Valhall, passing the relevant GLES3.1 tests. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>	2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig	ed5a5a9d6d	panfrost: Wire up transfrom feedback sysvals Wire the Gallium interface for transform feedback up to the system values that will be fed into our lowering code. This is based on our existing transform feedback implementation for Midgard. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>	2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig	4e341e70d8	pan/bi: Handle transform feedback intrinsics Translate the intrinsics we introduced to lower away transform feedback into Panfrost system values which the GL driver can handle. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>	2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig	ae3fa6cc1d	pan/bi: Add transform feedback lowering pass Add a simple NIR-based implementation of transform feedback, appropriate for OpenGL ES 3.1 class hardware (compute but no geometry or tessellation shaders). Stores to varyings that will be captured are replaced by stores to transform feedback buffers and some addressing math. This allows implementing the semantic of transform feedback in a compute-like stage. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>	2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig	ed4bd8738d	panfrost/ci: Mark draw_buffers_indexed.* as flakes These keep flaking. Icecream95 observes the issue relates to AFBC in the discussion of the flake in issue 6604. Until the root cause can be identified and fixed, mark the tests as known flakes for CI. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16855>	2022-06-03 21:05:22 +00:00
Alyssa Rosenzweig	7535362204	pan/bi: Fix clper_xor on Mali-G31 Mali-G31 has the old CLPER instruction, not the new one, which means we don't get to specify a custom lane op. But the clper_xor helper incorrectly checked the arch, not the implementation quirk. Fixes: `c00e7b729f` ("pan/bi: Optimize abs(derivative)") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reported-by: Icecream95 <ixn@disroot.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16846>	2022-06-02 20:32:43 -04:00
Alyssa Rosenzweig	ad5c84999b	pan/bi: Rework Valhall register alignment Because we lower SPLIT and COLLECT before RA, we need to consider offsets when determining the dimensions of vectors, in order to align properly. Lowering COLLECT post-RA would avoid this special case. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>	2022-06-02 17:13:16 +00:00
Alyssa Rosenzweig	0770e7a90c	pan/bi: Align 64-bit register sources Similar idea to aligning staging register sources. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>	2022-06-02 17:13:16 +00:00
Alyssa Rosenzweig	8553dd97ad	pan/bi: Allow vec6 for collects Hit for some Valhall texturing instructions. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>	2022-06-02 17:13:16 +00:00
Icecream95	1bfff407b9	pan/bi: Use nodearrays for linear constraints Speeds up compiling shaders/skia/781.shader_test in shader-db by 8x (Icecream95). ...At least it did before I extended to support register allocation of vec8. On Valhall, texture instructions require up to 8 consecutive registers. To handle this, provide for vec8 register allocation. Liveness was already (accidentally?) vec8. The increased memory requirement is acceptable given that the interference matrix is now stored sparsely (Alyssa). Icecream95 reports the vec8 changes hurt RA performance by about 1% on average. I consider this acceptable for now. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>	2022-06-02 17:13:16 +00:00
Icecream95	c70daa74f0	pan/bi: Add nodearray datastructure This is an array which can either be sparse or dense, and was designed to be used to track liveness and interference information. Either a sparse array with sorted indices or dense array is used. Other data structures were tried, such as red-black trees or hash tables, but they were slower. When used for storing constraints, the indices do not have to be sorted as duplicating elements is okay, but the speedup from that was not enough to justify the extra complexity. v2: Add a comment about how to potentially speed it up. But it seems fast enough even without this change. v3: Use a custom struct rather than relying on util_dynarray. v4: Split out functions only used for liveness analysis, rather than the simpler data structure needed for the register interference matrix. If we need to optimize liveness, that can follow on after. Also make it for vec8 (Alyssa). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>	2022-06-02 17:13:16 +00:00
Icecream95	c24b78cceb	pan/bi: Reverse linear constraint bits This will make it simpler to implement parallel RA where multiple possible registers for a node are tested at once. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>	2022-06-02 17:13:16 +00:00

1 2 3 4 5 ...

4179 Commits