KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	52056206e1	anv: Don't assert that stencil HiZ clears are single-slice It's true for depth HiZ clears because we only have HiZ on single-slice images right now. However, for stencil-only clears there is no such restriction. Tested-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-21 13:54:11 -08:00
Jason Ekstrand	7dd0f73fe1	anv: Only copy clear dwords if we're rendering to the first slice Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-02-21 12:47:17 -08:00
Jason Ekstrand	0fa040e6f5	anv/cmd_buffer: Delete some assert-only variables Checking the sample count is almost as good as aux usage in this case. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-20 13:57:16 -08:00
Jason Ekstrand	e10a62662b	anv/cmd_buffer: Use layout_to_* helpers in compute_aux_usage Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-20 13:57:14 -08:00
Jason Ekstrand	7ea8131aa0	anv/cmd_buffer: Simplify transition_depth_buffer If we don't have HiZ, then anv_layout_to_aux_usage will return NONE for both layouts. If the two layouts are the same, they will get the aux usage. In either case, the code below will give us ISL_AUX_OP_NONE and we'll return without doing anything. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-20 13:57:09 -08:00
Jason Ekstrand	87e86ee2e6	anv/cmd_buffer: Do subpass image transitions in begin/end_subpass Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:25 -08:00
Jason Ekstrand	7d5f6b6088	anv/cmd_buffer: Mark depth/stencil surfaces written in begin_subpass Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:25 -08:00
Jason Ekstrand	8a3f086a42	anv/cmd_buffer: Sync clear values in begin_subpass This is quite a bit cleaner because we now sync the clear values at the same time as we do the fast clear. For loading the clear values into the surface state, we now do it once when we handle the LOAD_OP_LOAD instead of every subpass. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:25 -08:00
Jason Ekstrand	a4136b8c1a	anv/pass: Store usage in each subpass attachment This requires us to ditch the VkAttachmentReference struct in favor of an anv-specific struct. However, we can now easily identify from just the subpass attachment what kind of an attachment it is. This will make iteration over anv_subpass::attachments a little easier in some case. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:25 -08:00
Jason Ekstrand	bd356e1bcf	anv/cmd_buffer: Add a concept of pending load aspects These are the same as pending clear aspects only for the "load" operation. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:25 -08:00
Jason Ekstrand	e526d49edd	anv/cmd_buffer: Iterate all subpass attachments when clearing This unifies things a bit because we now handle depth and stencil at the same time. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:25 -08:00
Jason Ekstrand	2cc3445eb2	anv/cmd_buffer: Decide whether or not to HiZ clear up-front This moves the decision out of begin_subpass and into BeginRenderPass like the decision for color clears. We use a similar name for the function for depth/stencil as for color even though no aux usage is really getting computed. v2 (Jason Ekstrand): - Don't always disable HiZ clears by accident - Use the initial layout to decide whether to do fast clears Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	6fc8555610	anv/cmd_buffer: Move the rest of clear_subpass into begin_subpass Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	1900dd76d0	anv/cmd_buffer: Move the color portion of clear_subpass into begin_subpass This doesn't really change much now but it will give us more/better control over clears in the future. The one interesting functional change here is that we are now re-emitting 3DSTATE_DEPTH_BUFFERS and friends for each clear. However, this only happens at begin_subpass time so it shouldn't be substantially more expensive. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	6fb9d6c6f5	anv/cmd_buffer: Pass a subpass id into begin_subpass This is a bit less awkward than passing in the subpass because it means we don't have to extract the subpass id from the subpass. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	01223b8199	anv/cmd_buffer: Add begin/end_subpass helpers Having begin/end_subpass is a bit nicer than the begin/next/end hooks that Vulkan gives us. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	b5bd3fb4e4	anv/cmd_buffer: Apply subpass flushes before set_subpass This seems slightly more correct because it means that the flushes happen before any clears or resolves implied by the subpass transition. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	869448a8ab	anv: Use framebuffer layers for implicit subpass transitions Fixes: `de3be61801` "anv/cmd_buffer: Rework aux tracking" Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	85d0bec961	anv: Be more careful about fast-clear colors Previously, we just used all the channels regardless of the format. This is less than ideal because some channels may have undefined values and this should be ok from the client's perspective. Even though the driver should do the correct thing regardless of what is in the undefined value, it makes things less deterministic. In particular, the driver may choose to fast-clear or not based on undefined values. This level of nondeterminism is bad. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Anuj Phogat	7b283544dc	anv/icl: Add render target flush after uploading binding table The PIPE_CONTROL command description says: "Whenever a Binding Table Index (BTI) used by a Render Taget Message points to a different RENDER_SURFACE_STATE, SW must issue a Render Target Cache Flush by enabling this bit. When render target flush is set due to new association of BTI, PS Scoreboard Stall bit must be set in this packet." Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Jason Ekstrand	f37bd726c7	anv: Don't resolve or ambiguate non-existent layers The previous code was trying to avoid non-existent layers by taking a MAX with anv_image_aux_layers. Unfortunately, it wasn't taking into account that layer_count starts at base_layer which may not be zero. Instead, we need to subtract base_layer from anv_image_aux_layers with a guard against roll-over. Fixes: `de3be61801` "anv/cmd_buffer: Rework aux tracking" Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-12 15:14:57 -08:00
Jason Ekstrand	1810f965c8	anv: Allow fast-clearing the first slice of a multi-slice image Now that we're tracking aux properly per-slice, we can enable this for applications which actually care. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	de3be61801	anv/cmd_buffer: Rework aux tracking This commit completely reworks aux tracking. This includes a number of somewhat distinct changes: 1) Since we are no longer fast-clearing multiple slices, we only need to track one fast clear color and one fast clear type. 2) We store two bits for fast clear instead of one to let us distinguish between zero and non-zero fast clear colors. This is needed so that we can do full resolves when transitioning to PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear values in all sorts of places we wouldn't normally. 3) We now track compression state as a boolean separate from fast clear type and this is tracked on a per-slice granularity. The previous scheme had some issues when it came to individual slices of a multi-LOD images. In particular, we only tracked "needs resolve" per-LOD but you could do a vkCmdPipelineBarrier that would only resolve a portion of the image and would set "needs resolve" to false anyway. Also, any transition from an undefined layout would reset the clear color for the entire LOD regardless of whether or not there was some clear color on some other slice. As far as full/partial resolves go, he assumptions of the previous scheme held because the one case where we do need a full resolve when CCS_E is enabled is for window-system images. Since we only ever allowed X-tiled window-system images, CCS was entirely disabled on gen9+ and we never got CCS_E. With the advent of Y-tiled window-system buffers, we now need to properly support doing a full resolve of images marked CCS_E. v2 (Jason Ekstrand): - Fix an bug in the compressed flag offset calculation - Treat 3D images as multi-slice for the purposes of resolve tracking v3 (Jason Ekstrand): - Set the compressed flag whenever we fast-clear - Simplify the resolve predicate computation logic Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	2cbfcb205e	anv/cmd_buffer: Move the mi_alu helper higher up Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	f0523f70ef	anv: Use blorp_ccs_ambiguate instead of fast-clears Even though the blorp pass looks a bit on the sketchy side, the end result in the Vulkan driver is very nice. Instead of having this weird case where you do a fast clear and then maybe have to resolve, we just do the ambiguate and are done with it. The ambiguate does exactly what we want of setting all the CCS values to 0 which puts it into the pass-through state. This should also improve performance a bit in certain cases. For instance, if we did a transition from UNDEFINED to GENERAL for a surface that doesn't have CCS enabled all the time, we would end up doing a fast-clear and then a full resolve which ends up touching every byte in the main surface as well as the CCS. With the ambiguate pass, that transition only touches the CCS. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	84fd2ebfbc	anv/cmd_buffer: Re-arrange the logic around UNDEFINED fast-clears Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	3ef8c4b2f5	anv/cmd_buffer: Pull the undefined layout condition into the if Now that this isn't a multi-case if and it's just the one case, it's a bit clearer if the condition is just part of the if instead of being pulled out into a boolean variable. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	13b621d6fd	anv: Only fast clear single-slice images The current strategy we use for managing resolves has an issues where we track clear colors and the need for resolves per-LOD but we still allow resolves of only a subset of the slices in any given LOD and doing so sets the "needs resolve" flag for that LOD to false while leaving the remaining layers unresolved. This patch is only the first step and does not, by itself fix anything. However, it's fairly self-contained and splitting it out means any performance regressions should bisect to this nice obvious commit rather than to the giant "rework aux tracking" commit. Nanley and I did some testing and none of the applications we tested even tried to fast-clear anything other than the first slice of an image. The test was done by adding a printf right before we call blorp_fast_clear if we were every going to touch any slice other than the first with a fast-clear. Due to the way the original code was structured, this would not have included applications which only cleared a subset of layers. The applications tested were: * All Sascha Willems demos * Aztec Ruins * Dota 2 * The Talos Principle * Mad Max * Warhammer 40,000: Dawn of War III * Serious Sam Fusion 2017: BFE While not the full list of shipping applications, it's a pretty good spread and covers most of the engines we've seen running on our driver. If this is ever shown to be a performance problem in the future, we can reconsider our strategy. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	571ed588ac	anv/cmd_buffer: Add a mark_image_written helper Currently, this helper does nothing but we call it every place where an image is written through the render pipeline. This will allow us to properly mark the aux state so that we can handle resolves correctly. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	ab7543b13d	anv/cmd_buffer: Generalize transition_color_buffer This moves it to being based on layout_to_aux_usage instead of being hard-coded based on bits of a priori knowledge of how transitions interact with layouts. This conceptually simplifies things because we're now using layout_to_aux_usage and layout_supports_fast_clear to make resolve decisions so changes to those functions will do what one expects. There is a potential bug with window system integration on gen9+ where we wouldn't do a resolve when transitioning to the PRESENT_SRC layout because we just assume that everything that handles CCS_E can handle it all the time. When handing a CCS_E image off to the window system, we may need to do a full resolve if the window system does not support the CCS_E modifier. The only reason why this hasn't been a problem yet is because we don't support modifiers in Vulkan WSI and so we always get X tiling which implies no CCS on gen9+. This patch doesn't actually fix that bug yet but it takes us the first step in that direction by making us actually pick the correct resolve op. In order to handle all of the cases, we need more detailed aux tracking. v2 (Jason Ekstrand): - Make a few more things const - Use the anv_fast_clear_support enum v3 (Jason Ekstrand): - Move an assert and add a better comment Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	151771b390	anv/cmd_buffer: Recurse in transition_color_buffer instead of falling through Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	5c38ab8f07	anv/blorp: Rework HiZ ops to look like MCS and CCS Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	42f1668a54	anv/blorp: Rework image clear/resolve helpers This replaces image_fast_clear and ccs_resolve with two new helpers that simply perform an isl_aux_op whatever that may be on CCS or MCS. This is a bit cleaner as it separates performing the aux operation from which blorp helper we have to call to do it. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	97938dac36	anv/cmd_buffer: Re-emit the pipeline at every subpass If we ever hit this edge-case, it can theoretically cause problem for CNL because we could end up changing render targets without re-emitting 3DSTATE_MULTISAMPLE which is part of the pipeline. Just get rid of the edge case. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-01-30 17:16:33 -08:00
Rafael Antognolli	fa21ddf7b1	anv/cmd_buffer: Emit PIPE_CONTROL with ISP bit on older platforms. Emit it on all platforms since gen7. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-29 14:52:07 -08:00
Rafael Antognolli	20578f81a6	anv/gen10: Emit CS stall and mark push constants dirty. I got reviews and fixed the patches locally, but ended up merging the ones that I sent originally to the list. This patch fixes those mistakes. Fixes: `78c125af39` Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 11:59:17 -08:00
Rafael Antognolli	78c125af39	anv/gen10: Ignore push constant packets during context restore. Similar to the GL driver, ignore 3DSTATE_CONSTANT_* packets when doing a context restore. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: "18.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 10:07:40 -08:00
Iago Toral Quiroga	75a4802060	anv/cmd_buffer: add the pipeline layout to the pipeline state We need to access the pipeline layout to compute correct dynamic offsets for dyamic UBO/SSBO descriptors when we emit draw commands. Instead of taking it from the pipeline object, store the layout in the command buffer pipeline state. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:47 +01:00
Jason Ekstrand	38ec78049f	anv/cmd_buffer: Move num_workgroups to compute state While we're here, make it an anv_address. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:44 -08:00
Jason Ekstrand	8bd5ec5b86	anv/cmd_buffer: Move vb_dirty bits into anv_cmd_graphics_state Vertex buffers are entirely a graphics pipeline thing. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:39 -08:00
Jason Ekstrand	e85aaec148	anv/cmd_buffer: Move dirty bits into anv_cmd_*_state Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:36 -08:00
Jason Ekstrand	97f96610c8	anv: Separate compute and graphics descriptor sets The Vulkan spec says: "pipelineBindPoint is a VkPipelineBindPoint indicating whether the descriptors will be used by graphics pipelines or compute pipelines. There is a separate set of bind points for each of graphics and compute, so binding one does not disturb the other." Up until now, we've been ignoring the pipeline bind point and had just one bind point for everything. This commit separates things out into separate bind points. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102897 Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:33 -08:00
Jason Ekstrand	31b2144c83	anv/cmd_buffer: Use anv_descriptor_for_binding for samplers Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:31 -08:00
Jason Ekstrand	9af5379228	anv/cmd_buffer: Add substructs to anv_cmd_state for graphics and compute Initially, these just contain the pipeline in a base struct. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:25 -08:00
Jason Ekstrand	ddc2d28548	anv/cmd_buffer: Use some pre-existing pipeline temporaries There are several places where we'd already saved the pipeline off to a temporary variable but, due to an artifact of history, weren't actually using that temporary everywhere. No functional change. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:24 -08:00
Alex Smith	4fd85617c1	anv: Make sure state on primary is correct after CmdExecuteCommands After executing a secondary command buffer, we need to update certain state on the primary command buffer to reflect changes by the secondary. Otherwise subsequent commands may not have the correct state set. This fixes various issues (rendering errors, GPU hangs) seen after executing secondary command buffers in some cases. v2 (Jason Ekstrand): - Reset to invalid values instead of pulling from the secondary - Change the comment to be more descriptive Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2018-01-11 18:11:08 +00:00
Kenneth Graunke	8eadc2fb8f	intel: Apply Geminilake "Barrier Mode" workaround. Apparently, Geminilake requires you to whack a chicken bit to select either compute or tessellation mode for barriers. The recommendation is to switch between them at PIPELINE_SELECT time. We may not need to do this all the time, but I don't know that it hurts either. PIPELINE_SELECT is already a pretty giant stall. This appears to fix hangs in tessellation control shaders with barriers on Geminilake. Note that this requires a corresponding kernel change, drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. in order for the register write to actually happen. Without an updated kernel, this register write will be noop'd and the fix will not work. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-01-09 10:13:33 -08:00
Alex Smith	0d8b9c529c	anv: Allow PMA optimization to be enabled in secondary command buffers This was never enabled in secondary buffers because hiz_enabled was never set to true for those. If the app provides a framebuffer in the inheritance info when beginning a secondary buffer, we can determine if HiZ is enabled and therefore allow the PMA optimization to be enabled within the command buffer. This improves performance by ~13% on an internal benchmark on Skylake. v2: Use anv_cmd_buffer_get_depth_stencil_view(). Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-08 09:31:17 +00:00
Jason Ekstrand	2f9eb045f3	anv/cmd_buffer: Add support for pushing UBO ranges In order to do this we have to modify push constant set up to handle ranges. We also have to tweak the way we handle dirty bits a bit so that we re-push whenever a descriptor set changes. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	0c879b62b0	anv/cmd_buffer: Add some stage asserts There are several places where we look up opcodes in an array of stages. Assert that the we don't end up going out-of-bounds. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00

1 2 3 4 5

249 Commits