KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Alyssa Rosenzweig	d4600c4340	pan/mdg: Remove nir_alu_src_index Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>	2020-04-29 15:18:38 +00:00
Alyssa Rosenzweig	fbbe3d4b75	pan/bi: Use common IR indices Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>	2020-04-29 15:18:38 +00:00
Alyssa Rosenzweig	5860b18665	panfrost: Move Bifrost IR indexing to common Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>	2020-04-29 15:18:38 +00:00
Alyssa Rosenzweig	e3062edff4	panfrost: Fix BO reference counting Typo. Fixes: `3283c7f4da` ("panfrost: Inline reference counting routines") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>	2020-04-29 15:18:38 +00:00
Marek Olšák	22a4cb4937	ac: enable displayable DCC on Navi12 & Navi14 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Marek Olšák	3b45631d7a	ac/surface: validate that DCC is enabled correctly on gfx9+ Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Marek Olšák	5e31e4b697	ac/surface: add code for gfx10 displayable DCC Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Marek Olšák	e2fbba7720	ac/surface: move non-displayable DCC to the end of the buffer Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Marek Olšák	a3dc7fffbb	ac/surface: don't compute DCC if it's unsupported by DCN on gfx9+ Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Marek Olšák	5d785f99b7	ac/surface: match get_display_flag() with expectations for is_displayable Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Marek Olšák	3dc2ccc14c	ac/surface: replace RADEON_SURF_OPTIMIZE_FOR_SPACE with !FORCE_SWIZZLE_MODE Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Marek Olšák	f6d87ec8a9	ac/surface: remove RADEON_SURF_TC_COMPATIBLE_HTILE and assume it's always set So that drivers can enable it without worrying how the texture was allocated. v2: reworked the mechanism, hopefully fixes now added Bas Nieuwenhuizen's diff to fix radv Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (v1) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Marek Olšák	25d3cc293e	ac/surface: rename micro tile mode enums like gfx10 uses them Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>	2020-04-29 14:53:25 +00:00
Thomas Hellstrom	298e247776	winsys/svga: Optionally avoid caching buffer maps Mapping of graphics kernel buffers is quite costly. Therefore the svga drm winsys caches all kernel buffer maps. However, that may lead to less testing coverage of the unmap paths and (possibly) processes running out of virtual memory space. Introduce a possibility to avoid that caching by setting the environment variable SVGA_FORCE_KERNEL_UNMAPS to 1. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Matthew McClure <mcclurem@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>	2020-04-29 13:45:12 +00:00
Thomas Hellstrom	422148de52	gallium/pipebuffer: Use persistent maps for slabs Instead of the ugly practice of relying on the provider caching maps, introduce and use persistent pipebuffer maps. Providers that can't handle persistent maps can't use the slab manager. The only current user is the svga drm winsys which always maps persistently. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>	2020-04-29 13:45:12 +00:00
Timur Kristóf	e4e1a0ac13	radv: Use smaller esgs_itemsize for ACO. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Timur Kristóf	ee5f04c9c9	aco: Use new default driver locations. The way the new locations are set up has much fewer gaps between each I/O slot, so this results in a massive reduction in the LDS usage of tessellation shaders. Totals (GFX10): VGPRS: 3976792 -> 3974864 (-0.05 %) Code Size: 260552784 -> 260532860 (-0.01 %) bytes LDS: 48723 -> 30179 (-38.06 %) blocks Max Waves: 1053407 -> 1053583 (0.02 %) Totals from affected shaders (1407 shaders on GFX10): SGPRS: 59144 -> 59216 (0.12 %) VGPRS: 63024 -> 61096 (-3.06 %) Code Size: 2695508 -> 2675584 (-0.74 %) bytes LDS: 47109 -> 28565 (-39.36 %) blocks Max Waves: 12999 -> 13175 (1.35 %) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Timur Kristóf	efa4976709	radv: Use new linking helper to set default driver locations. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Timur Kristóf	7aa61c84fe	nir: Add new linking helper to set linked driver locations. This commit introduces a new function nir_assign_linked_io_var_locations which is intended to help with assigning driver locations to shaders during linking, primarily aimed at the VS->TCS->TES->GS stages. It ensures that the linked shaders have the same driver locations, and it also packs these as close to each other as possible. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Timur Kristóf	7056714f50	aco: Set config->lds_size when TES or VS is running on HW ESGS. This doesn't fix anything, just reports the LDS size used by merged ESGS shaders, such as vertex_geometry_gs and tess_eval_geometry_gs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Timur Kristóf	baa46878d4	aco: Calculate workgroup size of legacy GS. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Timur Kristóf	fdbb296853	aco: Remember VS/TCS output driver locations. Instead of relying on calling shader_io_get_unique_index repeatedly, remember the which output driver location corresponds to which varying slot. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Timur Kristóf	ab07c4ea70	aco: Use context variables instead of calculating TCS inputs/outputs. VS needs the number of TCS inputs, and TES needs the number of TCS outputs. It is error-prone to repeat those calculations in both instruction selection and setup. Just set them in one place instead. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Timur Kristóf	fd0248c37b	radv: Refactor calculate_tess_lds_size and get_tcs_num_patches. Previously these functions needed the bit mask of the TCS outputs and patch outputs written, and concluded the number of outputs from that. Now, they take the number of outputs and patch outputs instead. This will allow the backend compiler to better optimize the LDS layout. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>	2020-04-29 11:51:04 +00:00
Rhys Perry	9392ddab43	aco: consider blocks unreachable if they are in the logical cfg unreachable was true if the last block is unreachable in the linear cfg, but it should also be true if it is unreachable in the logical cfg. Fixes dEQP-VK.graphicsfuzz.for-with-ifs-and-return Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `8d8c864beb` ('aco: improve check for unreachable loop continue blocks') Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4764>	2020-04-29 11:07:09 +00:00
Christopher James Halse Rogers	98675d34c1	egl/wayland: Fix zwp_linux_dmabuf usage There's no guarantee that the formats advertised by wl_drm and the formats advertised by zwp_linux_dmabuf_v1 are the same. get_back_bo() handles this by falling back from createImageWithModifiers() to createImage() when there's a wl_drm format but no corresponding linux_dmabuf format, but create_wl_buffer() unconditionally tries to create a linux_dmabuf buffer unless DRIimage has DRM_FORMAT_MOD_INVALID. Fix this by always checking if the DRIimage modifier has been advertised by zwp_linux_dmabuf_v1, and falling back to wl_drm if not. If DRM_FORMAT_MOD_INVALID has been advertised then we trust the client has allocated something appropriate and treat any modifier as matching. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2220 Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Simon Ser <contact@emersion.fr> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4294>	2020-04-29 11:29:40 +01:00
Danylo Piliaiev	8f0d387441	iris/bufmgr: Check if iris_bo_gem_mmap failed After refactoring of iris_bo_map_cpu and iris_bo_map_wc - immediate return of NULL on failure to mmap a buffer was lost. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2855 Fixes: `5bc3f52dd8` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4786>	2020-04-29 08:51:33 +00:00
Tapani Pälli	1a33358b27	anv: remove assert from GetImageMemoryRequirements[2] This assert is actually correct but due to how android hardware buffer support is implemented we should remove it, otherwise debug build of mesa hits the assert with Android CTS tests. Test creates VkImage with non-external format and sets up VkExternalMemoryImageCreateInfo to indicate that image may be used with Android hardwarebuffer handle. Then test attempts to get image memory requirements. Problem with this is that we setup all android supporting images as having external format and thus hit the assert as the size has not been set yet. This is not a problem in practice since android will bind ahw memory with the image later on. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2807 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4762>	2020-04-29 08:30:42 +00:00
Samuel Pitoiset	2f6648dc3c	gitlab-ci: add a list of expected failures for FIJI with ACO Timur has this chip now. The depth stencil resolve failures are somehow unexpected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4805>	2020-04-29 09:53:48 +02:00
Samuel Pitoiset	0e6afbbe56	radv: advertise VK_EXT_robustness2 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>	2020-04-29 07:29:54 +00:00
Samuel Pitoiset	0f1ead7b53	radv: handle NULL vertex bindings With VK_EXT_robustness2, an element of pBuffers can be NULL. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>	2020-04-29 07:29:54 +00:00
Samuel Pitoiset	c1ef225d18	radv: handle NULL descriptors All fields must be zero, otherwise the HW hangs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>	2020-04-29 07:29:54 +00:00
Samuel Pitoiset	60cc065c7d	aco: fix adjusting the sample index with FMASK if value is negative The SPIR-V spec doesn't say explicitly that the sample index must be an unsigned integer. This fixes crashes with some new VK_EXT_robustness2 tests. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>	2020-04-29 07:29:54 +00:00
Samuel Pitoiset	a112ec4c11	aco: fix nir_texop_texture_samples with NULL descriptors With VK_EXT_robustness2, descriptors can be NULL and the number of samples returned by nir_texop_texture_samples should be 0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>	2020-04-29 07:29:54 +00:00
Samuel Pitoiset	aa94213781	ac/llvm: fix nir_texop_texture_samples with NULL descriptors With VK_EXT_robustness2, descriptors can be NULL and the number of samples returned by nir_texop_texture_samples should be 0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>	2020-04-29 07:29:54 +00:00
Caio Marcelo de Oliveira Filho	a3cba3c771	intel/fs: Only stall after sending all memory fence messages In Gen11+, when emitting a fence for both L3 and SLM, the generated code would look like SEND, MOV (for stall), SEND, MOV (for stall) This commit change that so two SENDs are emitted before the MOVs for stall. This is similar to the approach used in Ivy Bridge for the render fence. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho	f858fa26b4	intel/fs,vec4: Pull stall logic for memory fences up into the IR Instead of emitting the stall MOV "inside" the SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when creating the IR. For IvyBridge, every (data cache) fence is accompained by a render cache fence, that now is explicit in the IR, two SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs). Because Begin and End interlock intrinsics are effectively memory barriers, move its handling alongside the other memory barrier intrinsics. The SHADER_OPCODE_INTERLOCK is still used to distinguish if we are going to use a SENDC (for Begin) or regular SEND (for End). This change is a preparation to allow emitting both SENDs in Gen11+ before we can stall on them. Shader-db results for IVB (i965): total instructions in shared programs: 11971190 -> 11971200 (<.01%) instructions in affected programs: 11482 -> 11492 (0.09%) helped: 0 HURT: 8 HURT stats (abs) min: 1 max: 3 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for instructions value: 0.66 1.84 95% mean confidence interval for instructions %-change: 0.01% 0.27% Instructions are HURT. Unlike the previous code, that used the `mov g1 g2` trick to force both `g1` and `g2` to stall, the scheduling fence will generate `mov null g1` and `mov null g2`. During review it was decided it was not worth keeping the special codepath for the small effect will have. Shader-db results for HSW (i965), BDW and SKL don't have a change on instruction count, but do report changes in cycles count, showing SKL results below total cycles in shared programs: 341738444 -> 341710570 (<.01%) cycles in affected programs: 7240002 -> 7212128 (-0.38%) helped: 46 HURT: 5 helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154 helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95% HURT stats (abs) min: 2 max: 1768 x̄: 646.40 x̃: 362 HURT stats (rel) min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08% 95% mean confidence interval for cycles value: -777.71 -315.38 95% mean confidence interval for cycles %-change: -1.42% -0.83% Cycles are helped. This seems to be the effect of allocating two registers separatedly instead of a single one with size 2, which causes different register allocation, affecting the cycle estimates. while ICL also has not change on instruction count but report changes negative changes in cycles total cycles in shared programs: 352665369 -> 352707484 (0.01%) cycles in affected programs: 9608288 -> 9650403 (0.44%) helped: 4 HURT: 104 helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101 helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49% HURT stats (abs) min: 2 max: 2016 x̄: 408.36 x̃: 48 HURT stats (rel) min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45% 95% mean confidence interval for cycles value: 256.67 523.24 95% mean confidence interval for cycles %-change: 0.63% 1.03% Cycles are HURT. AFAICT this is the result of the case above. Shader-db results for TGL have similar cycles result as ICL, but also affect instructions total instructions in shared programs: 17690586 -> 17690597 (<.01%) instructions in affected programs: 64617 -> 64628 (0.02%) helped: 55 HURT: 32 helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3 helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74% HURT stats (abs) min: 1 max: 65 x̄: 7.44 x̃: 2 HURT stats (rel) min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69% 95% mean confidence interval for instructions value: -2.03 2.28 95% mean confidence interval for instructions %-change: -0.41% 0.15% Inconclusive result (value mean confidence interval includes 0). Now that more is done in the IR, more dependencies are visible and more SWSB annotations are emitted. Mixed with different register allocation decisions like above, some shaders will see more `sync nops` while others able to avoid them. Most of the new `sync nops` are also redundant and could be dropped, which will be fixed in a separate change. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho	0e96b0d6dd	intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Bas Nieuwenhuizen	9248b04528	radv: Expose 4G element texel buffers. Old value seems to be copied from anv. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4787>	2020-04-29 07:05:04 +00:00
Kenneth Graunke	506414e837	iris: Fix downcast of bound_vertex_buffers from uint64_t to int This is the wrong data type, the original field - and the values we're adding in - are both 64-bit unsigned. Keep the original data type. Thanks to Dave Airlie for finding this while reading the code. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4802>	2020-04-29 06:50:54 +00:00
Francisco Jerez	5e2a7e11b4	intel/ir: Remove scheduling-based cycle count estimates. The cycle count estimation logic part of the scheduler is now redundant with the shader performance modeling pass, and the estimates can be consolidated into the brw::performance analysis result object instead of being part of the CFG, which guarantees that the estimates cannot be accessed without previously calling the performance_analysis::require() method, which makes sure that the right analysis pass is executed at the right time if we don't already have up-to-date cached results. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	486f3b04a5	intel/ir: Pass block cycle count information explicitly to disassembler. So we can eventually remove the cycle count estimates from the CFG data structure and consolidate performance information in the brw::performance object. It would be cleaner to pass the brw::performance object directly to the disassembler but that isn't straightforward since the disassembler is built as a plain C file unlike the rest of the compiler back-end. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	6579f562c3	intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats. These should be more accurate than the current cycle counts, since among other things they consider the effect of post-scheduling passes like the software scoreboard on TGL. In addition it will enable us to clean up some of the now redundant cycle-count estimation functionality in the instruction scheduler. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	65342be3ae	intel/fs: Add INTEL_DEBUG=no32 debugging flag. This is useful in order to identify codegen issues caused by SIMD32. It doesn't currently have any effect on compute shaders since SIMD32 dispatch is only enabled for CS when it's strictly necessary to do so in order to support the workgroup size requested for the shader -- That might change in the future though when we hook up the SIMD32 heuristic to CS compilation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	14f0a5cf64	intel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders. The heuristic enables the SIMD32 fragment shader based on whether the IR performance modeling pass predicts it to have greater throughput than the SIMD16 and SIMD8 variants of the same shader. It would be straightforward to do the same thing in order to control whether SIMD16 dispatch is enabled, but it's pending additional performance evaluation. The INTEL_DEBUG=do32 option is left around in order to force the SIMD32 shader to be used regardless of the result of the heuristic, since it's useful as a debugging aid e.g. in order to identify SIMD32-specific codegen issues which may be masked by the SIMD32 heuristic, or cases where the heuristic is incorrectly disabling SIMD32 shaders that offer a performance advantage. Currently this is only enabled on Gen6+, since SIMD32 codegen support is incomplete on earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	d6aa0c261f	intel/fs: Heap-allocate fs_visitors in brw_compile_fs(). This makes brw_compile_fs() look a bit more similar to brw_compile_cs(). It saves us three v*_shader_stats local variables, and will save us additional triplicated declarations as we start tracking IR performance analysis results. The triplicated cfg pointers are left around because they're set to NULL to mark specific dispatch modes as disabled (e.g. in order to enforce hardware restrictions). Doing the same thing with the visitor pointers would cause data leaks. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	188a3659ae	intel/ir: Import shader performance analysis pass. This introduces an analysis pass intended to estimate several performance statistics of the shader, including cycle count latency and throughput values, based on static modeling. It has instruction performance information more comprehensive than the current scheduling pass for all platforms between Gen4-11, and works on both the FS and VEC4 back-end. The most immediate purpose of this pass is to implement a heuristic meant to determine whether using SIMD32 dispatch for a fragment shader can be expected to help more than it hurts. In addition this will allow the effect of passes run after scheduling (e.g. the TGL software scoreboard pass and the VEC4 dependency control pass) to be visible in shader-db statistics. But that isn't the end of the story, other potential applications of this pass (not part of this MR) I've been playing around with are: - Implement a similar SIMD16 heuristic allowing the identification of inefficient SIMD16 fragment shaders. - Implement similar SIMD16 and SIMD32 heuristics for the compute shader stage -- Currently compute shader builds always use the SIMD16 shader if available and never use the SIMD32 shader unless strictly necessary, which is suboptimal under certain conditions. - Hook up to the instruction scheduler in order to improve the accuracy of its timing information. - Use as heuristic in order to drive the selection of scheduling modes (Matt was experimenting with that). - Plug to the TGL software scoreboard pass in order to implement a more effective SBID token allocation algorithm, since in general the optimal token allocation depends on the timings of all instructions in the program. - Use its bottleneck detection functionality in order to implement a heuristic computing a more optimal bound for the number of fragment shader threads executed in parallel (by adjusting the MaximumNumberofThreadsPerPSD control of 3DSTATE_PS). As a follow-up I'm planning to submit updated timing information for Gen12 platforms -- Everything else required to support Gen12 like SWSB handling is already included in this patch, but there were some IP concerns regarding the TGL timing parameters since they cannot currently be obtained with the documentation and hardware which is publicly available. The timing parameters for any previous Gen7-11 platforms can be obtained by anyone by sampling the timestamp register using e.g. shader_time, though I have some more convenient instrumentation coming up. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:03 -07:00
Francisco Jerez	c8ce1cfc9c	intel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	bda1d72dd9	intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function. This will be re-usable by the IR performance analysis pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	d2ed740795	intel/fs: Fix constness of argument of fs_instruction_scheduler::is_compressed(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00

1 2 3 4 5 ...

123171 Commits All Branches Search

123171 Commits

All Branches