KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	85d0bec961	anv: Be more careful about fast-clear colors Previously, we just used all the channels regardless of the format. This is less than ideal because some channels may have undefined values and this should be ok from the client's perspective. Even though the driver should do the correct thing regardless of what is in the undefined value, it makes things less deterministic. In particular, the driver may choose to fast-clear or not based on undefined values. This level of nondeterminism is bad. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	4796025ba5	intel/isl: Add an isl_color_value_is_zero helper Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:24 -08:00
Jason Ekstrand	116e818ef1	anv/gpu_memcpy: CS Stall before a MI memcpy on gen7 This fixes a pile of hangs caused by the recent shuffling of resolves and transitions. The particularly problematic case is when you have at least three attachments with load ops of CLEAR, LOAD, CLEAR. In this case, we execute the first CLEAR followed by a MI memcpy to copy the clear values over for the LOAD followed by a second CLEAR. The MI commands cause the first CLEAR to hang which causes us to get stuck on the 3DSTATE_MULTISAMPLE in the second CLEAR. We also add guards for BLORP to fix the same issue. These shouldn't actually do anything right now because the only use of indirect clears in BLORP today is for resolves which are already guarded by a render cache flush and CS stall. However, this will guard us against potential issues in the future. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-20 13:49:19 -08:00
Iago Toral Quiroga	af5f2322d0	anv/entrypoints: make vkGetDeviceProcAddr return NULL for instance commands Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-20 08:12:32 +01:00
Anuj Phogat	7b283544dc	anv/icl: Add render target flush after uploading binding table The PIPE_CONTROL command description says: "Whenever a Binding Table Index (BTI) used by a Render Taget Message points to a different RENDER_SURFACE_STATE, SW must issue a Render Target Cache Flush by enabling this bit. When render target flush is set due to new association of BTI, PS Scoreboard Stall bit must be set in this packet." Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Anuj Phogat	136f583a24	anv/icl: Enable float blend optimization Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Anuj Phogat	cd7102972f	anv/icl: Use gen11 functions Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Anuj Phogat	9673c21d4f	anv/icl: Build anv libs for gen11 Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Anuj Phogat	1f108b436b	anv/icl: Generate gen11 entry point functions Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Anuj Phogat	a86c0a08df	anv/icl: Don't use DISPATCH_MODE_SIMD4X2 Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Anuj Phogat	cd5fc634a8	anv/icl: Don't use SingleVertexDispatch Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Anuj Phogat	6e3940b3cf	anv/icl: Don't set ResetGatewayTimer Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:32 -08:00
Anuj Phogat	41a4c2c8e8	anv/icl: Add #define genX Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:31 -08:00
Anuj Phogat	413d475b44	anv/icl: Add gen11 mocs defines Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-16 11:10:31 -08:00
Anuj Phogat	ba3cbee6c5	intel/common/icl: Add has_sample_with_hiz flag in gen_device_info Sampling from hiz is enabled in i965 for GEN9+ but this feature has been removed from gen11. So, this new flag will be useful to turn the feature on/off for different gen h/w. It will be used later in a patch adding device info for gen11. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-15 16:14:56 -08:00
Anuj Phogat	9c144dc81e	i965/icl: Add assertions to check dispatch mode is SIMD8 SIMD4x2 dispatch mode has been removed in GEN11. We're not using it anyways in Mesa. Adding few asserts to make it explicit. Use GEN_GEN macro in place of devinfo->gen (Ken) Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-15 16:14:56 -08:00
Anuj Phogat	e9ad5c9a5d	i965/icl: Update the comment for maximum number of threads per PSD Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-15 16:14:56 -08:00
Anuj Phogat	772a75be46	intel/icl: Do StateCacheInvalidation for indirect clear color StateCacheInvalidation is required on all gen7+ platforms. We don't need to update this check for every new gen h/w unless this requirement is changed. So, dropping the check for latest gen h/w. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-15 16:14:55 -08:00
Anuj Phogat	bff24e2173	intel/isl/icl: Build and use gen11 surface state emit functions Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-02-15 16:14:55 -08:00
Anuj Phogat	0427bd4954	intel/isl/icl: Add the maximum surface size limit Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-15 16:14:55 -08:00
Anuj Phogat	c68ede0be7	intel/genxml/icl: Update genx_bits header Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-15 16:14:55 -08:00
Anuj Phogat	165a68b05a	intel/genxml/icl: Generate packing headers Move build system changes in to one patch (Ken, Emil) Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-02-15 16:14:55 -08:00
Anuj Phogat	7ed27d8cbf	intel/genxml/icl: Add gen11.xml Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-02-15 16:14:55 -08:00
Dylan Baker	7d0e342af2	meson: add convenience variable for anv_extensions.py depdendency Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-15 09:46:07 -08:00
Dylan Baker	0e617c04f1	meson: use depend_files for adding extra file dependencies cc: Jason Ekstrand <jason.ekstrand@intel.com> Fixes: `dd088d4bec` ("anv/extensions: Generate a header file with extension tables") Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-15 09:46:04 -08:00
Dylan Baker	b03969a5ad	meson: use depend_files to track extra file dependencies cc: Jason Ekstrand <jason.ekstrand@intel.com> Fixes: `f939940809` ("anv: Split anv_extensions.py into two files") Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-15 09:45:56 -08:00
Dylan Baker	384bff13e0	Revert "anv/meson: Make anv_entrypoints_gen.py depend on anv_extensions.py" This reverts commit `10d1b0be8e`. This is unnecessary, the depend_files argument is for adding dependencies on files that are not part of the input, which is already done. cc: Jason Ekstrand <jason.ekstrand@intel.com> Fixes: `10d1b0be8e` Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-15 09:45:40 -08:00
Anuj Phogat	0cd37f9178	isl: Don't use surface format R32_FLOAT for typed atomic integer operations From Skylake PRM Surface Formats section: "The surface format for the typed atomic integer operations must be R32_UINT or R32_SINT." Fixes an error and a piglit GPU hang in simulation environment. Piglit test: gl45-imageAtomicExchange-float.shader_test Suggested-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.co Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "18.0 17.3" <mesa-stable@lists.freedesktop.org>	2018-02-14 16:30:05 -08:00
Jason Ekstrand	8534af44e4	intel/aubinator: Correctly decode INTERFACE_DESCRIPTOR_DATA Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-14 13:17:26 -08:00
Rafael Antognolli	fcae3d1a9a	anv/gen10: Remove warning message. Gen10 seems pretty stable so far, remove "alpha support" message. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: "18.0" mesa-stable@lists.freedesktop.org Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-14 10:11:01 -08:00
Iago Toral Quiroga	cb9dbd6dec	i965/compiler: clean up nir_intrinsic_load_input for vertex shaders This code to re-set the type of the source and destination is not necessary since we never manipulate the types. Looks like a left over from a time where we had to retype to float temporarily to handle 64-bit inputs. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-14 12:00:14 +01:00
Iago Toral Quiroga	4917d38321	intel/compiler: fix first_component for 64-bit types on vertex inputs Divide it by two as we do for other stages. This is because the component layout qualifier is always in 32-bit units. Fixes issues in a new CTS test (still WIP): KHR-GL45.enhanced_layouts.varying_double_components Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-02-14 12:00:14 +01:00
Jason Ekstrand	4c77e21c81	anv: Move setting current_pipeline to cmd_state_init We were setting current_pipeline to UINT32_MAX and then calling cmd_cmd_state_reset which memsets the entire state struct to 0 which implicitly resets current_pipeline to 3D. I have no idea how this hasn't caused everything to explode. Fixes: `cd3feea745` "anv/cmd_buffer: Rework anv_cmd_state_reset" cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-02-12 15:18:23 -08:00
Jason Ekstrand	f37bd726c7	anv: Don't resolve or ambiguate non-existent layers The previous code was trying to avoid non-existent layers by taking a MAX with anv_image_aux_layers. Unfortunately, it wasn't taking into account that layer_count starts at base_layer which may not be zero. Instead, we need to subtract base_layer from anv_image_aux_layers with a guard against roll-over. Fixes: `de3be61801` "anv/cmd_buffer: Rework aux tracking" Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-12 15:14:57 -08:00
Kenneth Graunke	bd87bd178c	anv: Drop I915_EXEC_CONSTANTS_REL_GENERAL from execbuf. The kernel used to have execbuf parameters to program the INSTPM bit for whether 3DSTATE_CONSTANT_* should be relative to dynamic state base address or an absolute address. However, they never worked in the presence of hardware contexts, so I deleted them a while back. It doesn't make sense to set this flag, as it doesn't exist anymore. It also never did anything anyway - the flag is zero, so \|'ing it in did nothing. The default is relative anyway. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-12 07:00:41 -08:00
Grazvydas Ignotas	9b9a89cd79	intel/compiler: fix 64bit value prints on 32bit Fix the following: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘uint64_t {aka long long unsigned int}. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-02-10 17:59:02 +02:00
Jason Ekstrand	8f20cf166e	intel/blorp: Use isl_aux_op instead of blorp_hiz_op Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	1e941a0528	intel/blorp: Use isl_aux_op instead of blorp_fast_clear_op Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	1810f965c8	anv: Allow fast-clearing the first slice of a multi-slice image Now that we're tracking aux properly per-slice, we can enable this for applications which actually care. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	de3be61801	anv/cmd_buffer: Rework aux tracking This commit completely reworks aux tracking. This includes a number of somewhat distinct changes: 1) Since we are no longer fast-clearing multiple slices, we only need to track one fast clear color and one fast clear type. 2) We store two bits for fast clear instead of one to let us distinguish between zero and non-zero fast clear colors. This is needed so that we can do full resolves when transitioning to PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear values in all sorts of places we wouldn't normally. 3) We now track compression state as a boolean separate from fast clear type and this is tracked on a per-slice granularity. The previous scheme had some issues when it came to individual slices of a multi-LOD images. In particular, we only tracked "needs resolve" per-LOD but you could do a vkCmdPipelineBarrier that would only resolve a portion of the image and would set "needs resolve" to false anyway. Also, any transition from an undefined layout would reset the clear color for the entire LOD regardless of whether or not there was some clear color on some other slice. As far as full/partial resolves go, he assumptions of the previous scheme held because the one case where we do need a full resolve when CCS_E is enabled is for window-system images. Since we only ever allowed X-tiled window-system images, CCS was entirely disabled on gen9+ and we never got CCS_E. With the advent of Y-tiled window-system buffers, we now need to properly support doing a full resolve of images marked CCS_E. v2 (Jason Ekstrand): - Fix an bug in the compressed flag offset calculation - Treat 3D images as multi-slice for the purposes of resolve tracking v3 (Jason Ekstrand): - Set the compressed flag whenever we fast-clear - Simplify the resolve predicate computation logic Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	2cbfcb205e	anv/cmd_buffer: Move the mi_alu helper higher up Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	2e69045c4d	anv/image: Simplify some verbose commennts Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	f0523f70ef	anv: Use blorp_ccs_ambiguate instead of fast-clears Even though the blorp pass looks a bit on the sketchy side, the end result in the Vulkan driver is very nice. Instead of having this weird case where you do a fast clear and then maybe have to resolve, we just do the ambiguate and are done with it. The ambiguate does exactly what we want of setting all the CCS values to 0 which puts it into the pass-through state. This should also improve performance a bit in certain cases. For instance, if we did a transition from UNDEFINED to GENERAL for a surface that doesn't have CCS enabled all the time, we would end up doing a fast-clear and then a full resolve which ends up touching every byte in the main surface as well as the CCS. With the ambiguate pass, that transition only touches the CCS. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	84fd2ebfbc	anv/cmd_buffer: Re-arrange the logic around UNDEFINED fast-clears Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	3ef8c4b2f5	anv/cmd_buffer: Pull the undefined layout condition into the if Now that this isn't a multi-case if and it's just the one case, it's a bit clearer if the condition is just part of the if instead of being pulled out into a boolean variable. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	857b5b5a7f	intel/blorp: Add a CCS ambiguation pass This pass performs an "ambiguate" operation on a CCS-compressed surface by manually writing zeros into the CCS. On gen8+, ISL gives us a fairly detailed notion of how the CCS is laid out so this is fairly simple to do. On gen7, the CCS tiling is quite crazy but that isn't an issue because we can only do CCS on single-slice images so we can just blast over the entire CCS buffer if we want to. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	13b621d6fd	anv: Only fast clear single-slice images The current strategy we use for managing resolves has an issues where we track clear colors and the need for resolves per-LOD but we still allow resolves of only a subset of the slices in any given LOD and doing so sets the "needs resolve" flag for that LOD to false while leaving the remaining layers unresolved. This patch is only the first step and does not, by itself fix anything. However, it's fairly self-contained and splitting it out means any performance regressions should bisect to this nice obvious commit rather than to the giant "rework aux tracking" commit. Nanley and I did some testing and none of the applications we tested even tried to fast-clear anything other than the first slice of an image. The test was done by adding a printf right before we call blorp_fast_clear if we were every going to touch any slice other than the first with a fast-clear. Due to the way the original code was structured, this would not have included applications which only cleared a subset of layers. The applications tested were: * All Sascha Willems demos * Aztec Ruins * Dota 2 * The Talos Principle * Mad Max * Warhammer 40,000: Dawn of War III * Serious Sam Fusion 2017: BFE While not the full list of shipping applications, it's a pretty good spread and covers most of the engines we've seen running on our driver. If this is ever shown to be a performance problem in the future, we can reconsider our strategy. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	571ed588ac	anv/cmd_buffer: Add a mark_image_written helper Currently, this helper does nothing but we call it every place where an image is written through the render pipeline. This will allow us to properly mark the aux state so that we can handle resolves correctly. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	9876d6f0ef	anv/blorp: Add src/dst_level helper variables in CmdCopyImage Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	c180c2c868	anv/cmd_buffer: Add an anv_genX_call macro This is copied and pasted from the similar macro we added to ISL. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	ab7543b13d	anv/cmd_buffer: Generalize transition_color_buffer This moves it to being based on layout_to_aux_usage instead of being hard-coded based on bits of a priori knowledge of how transitions interact with layouts. This conceptually simplifies things because we're now using layout_to_aux_usage and layout_supports_fast_clear to make resolve decisions so changes to those functions will do what one expects. There is a potential bug with window system integration on gen9+ where we wouldn't do a resolve when transitioning to the PRESENT_SRC layout because we just assume that everything that handles CCS_E can handle it all the time. When handing a CCS_E image off to the window system, we may need to do a full resolve if the window system does not support the CCS_E modifier. The only reason why this hasn't been a problem yet is because we don't support modifiers in Vulkan WSI and so we always get X tiling which implies no CCS on gen9+. This patch doesn't actually fix that bug yet but it takes us the first step in that direction by making us actually pick the correct resolve op. In order to handle all of the cases, we need more detailed aux tracking. v2 (Jason Ekstrand): - Make a few more things const - Use the anv_fast_clear_support enum v3 (Jason Ekstrand): - Move an assert and add a better comment Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	151771b390	anv/cmd_buffer: Recurse in transition_color_buffer instead of falling through Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	bea7373c92	anv/image: Support color aspects in layout_to_aux_usage Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	b09464db42	anv/image: Add a helper for determining when fast clears are supported v2 (Jason Ekstrand): - Return an enum instead of a boolean v3 (Jason Ekstrand): - Return ANV_FAST_CLEAR_NONE instead of false (Topi) - Rename ANV_FAST_CLEAR_ANY to ANV_FAST_CLEAR_DEFAULT_VALUE - Add documentation for the enum values v4 (Jason Ekstrand): - Remove a dead comment Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	1f7eee6bc1	anv/image: Update a comment This got lost in all of the aspect vs. plane rebasing of YCBCR. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	5c38ab8f07	anv/blorp: Rework HiZ ops to look like MCS and CCS Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	1d473e26f2	anv/blorp: Support ISL_AUX_USAGE_HIZ in surf_for_anv_image If the function gets passed ANV_AUX_USAGE_DEFAULT, it still has the old behavior of setting ISL_AUX_USAGE_NONE for depth/stencil which is what we want for blits/copies. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	42f1668a54	anv/blorp: Rework image clear/resolve helpers This replaces image_fast_clear and ccs_resolve with two new helpers that simply perform an isl_aux_op whatever that may be on CCS or MCS. This is a bit cleaner as it separates performing the aux operation from which blorp helper we have to call to do it. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	482c24783e	intel/isl: Codify AUX operations in an enum Right now, we have different entrypoints and enums in blorp for these different operations. This provides us a central enum which we can begin to transition to. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Scott D Phillips	1f4d2433e7	meson: Add build option for tools Add a build option to control building some of the misc tools we have. Also set the executables to install, presumably you want that if you're asking for the build. v2: set 'install:' to the with_tools value, not true (Jordan) handle 'all' in a the comma list (Dylan) Add freedreno's tools (Dylan) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-02-08 11:24:42 -08:00
Timothy Arceri	ffeebcfa7e	i965: remove unused brw_nir_lower_cs_shared() This has been unused since `8761a04d0d`. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-02-07 08:38:01 +11:00
Iago Toral Quiroga	a5053ba27e	anv/device: initialize the list of enabled extensions properly The loop goes through the list of enabled extensions marking them as enabled in the list, but this relies on every other extension being initialized to false by default. This bug would make us, for example, advertise certain device extension entry points as available even when the corresponding extensions had not been enabled. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `abc62282b5` "anv: Add a per-device table of enabled extensions" Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-02-06 07:51:00 +01:00
Iago Toral Quiroga	1d20001d97	i965/nir: do int64 lowering before optimization Otherwise loop unrolling will fail to see the actual cost of the unrolling operations when the loop body contains 64-bit integer instructions, and very specially when the divmod64 lowering applies, since its lowering is quite expensive. Without this change, some in-development CTS tests for int64 get stuck forever trying to register allocate a shader with over 50K SSA values. The large number of SSA values is the result of NIR first unrolling multiple seemingly simple loops that involve int64 instructions, only to then lower these instructions to produce a massive pile of code (due to the divmod64 lowering in the unrolled instructions). With this change, loop unrolling will see the loops with the int64 code already lowered and will realize that it is too expensive to unroll. v2: Run nir_algebraic first so we can hopefully get rid of some of the int64 instructions before we even attempt to lower them. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-02-06 07:49:27 +01:00
Matt Turner	e2b31e9acf	i965: Move mistakenly placed line Ken called this out in review, but it seems I forgot to make the change. I noticed that the control flow annotations in the fragment shader disassembly of tests/shaders/glsl-fs-loop-continue.shader_test were not correct, and moving this line to the correct place fixes it.	2018-02-05 09:50:56 -08:00
Jason Ekstrand	589e9db23f	aubinator: Multiply count by 4 to compute buffer sizes The count field is in terms of dwords and not bytes. In `7d4007d58a`, I fixed one instance of this but missed another.	2018-02-02 22:30:56 -08:00
Kenneth Graunke	85ec7abc3f	intel/decoder: Fix control / evaluation label mixup. Trivial. DS is TES, HS is TCS.	2018-02-01 09:44:15 -08:00
Jason Ekstrand	97938dac36	anv/cmd_buffer: Re-emit the pipeline at every subpass If we ever hit this edge-case, it can theoretically cause problem for CNL because we could end up changing render targets without re-emitting 3DSTATE_MULTISAMPLE which is part of the pipeline. Just get rid of the edge case. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-01-30 17:16:33 -08:00
Iago Toral Quiroga	99b57daf4a	anv/pipeline: lower constant initializers on output variables earlier If a shader only writes to an output via a constant initializer we need to lower it before we call nir_remove_dead_variables so that this pass sees the stores from the initializer and doesn't kill the output. Fixes test failures in new work-in-progress CTS tests: dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_vert dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_frag Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-30 08:10:29 +01:00
Rafael Antognolli	fa21ddf7b1	anv/cmd_buffer: Emit PIPE_CONTROL with ISP bit on older platforms. Emit it on all platforms since gen7. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-29 14:52:07 -08:00
Timothy Arceri	5b8de4bdff	nir: add vs_inputs_dual_locations compiler option Allows nir drivers to either use a single or dual locations for vs double inputs. i965 uses dual locations for both OpenGL and Vulkan drivers, for now gallium OpenGL drivers only use a single location. The following patch will also make use of this option when calling nir_shader_gather_info(). Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-01-30 09:08:47 +11:00
Timothy Arceri	f63e05ae9e	compiler: tidy up double_inputs_read uses First we move double_inputs_read into a vs struct in the union, double_inputs_read is only used for vs inputs so this will save space and also allows us to add a new double_inputs field. We add the new field because `c2acf97fcc` changed the behaviour of double_inputs_read, and while it's no longer used to track actual reads in i965 we do still want to track this for gallium drivers. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-01-30 09:08:47 +11:00
Rafael Antognolli	20578f81a6	anv/gen10: Emit CS stall and mark push constants dirty. I got reviews and fixed the patches locally, but ended up merging the ones that I sent originally to the list. This patch fixes those mistakes. Fixes: `78c125af39` Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 11:59:17 -08:00
Rafael Antognolli	bcfd78e448	i965/gen10: Re-enable push constants. The GPU hang caused by push constants is apparently fixed, so let's enable them again. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 10:07:44 -08:00
Rafael Antognolli	78c125af39	anv/gen10: Ignore push constant packets during context restore. Similar to the GL driver, ignore 3DSTATE_CONSTANT_* packets when doing a context restore. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: "18.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 10:07:40 -08:00
Iago Toral Quiroga	d3ce493b34	anv/pipeline: remove the pipeline layout field from anv_pipeline It no longer has any users. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:47 +01:00
Iago Toral Quiroga	75a4802060	anv/cmd_buffer: add the pipeline layout to the pipeline state We need to access the pipeline layout to compute correct dynamic offsets for dyamic UBO/SSBO descriptors when we emit draw commands. Instead of taking it from the pipeline object, store the layout in the command buffer pipeline state. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:47 +01:00
Iago Toral Quiroga	e1a49f974b	anv/pipeline: don't take the layout from the pipeline to compile shaders The Vulkan spec states that VkPipelineLayout objects must not be destroyed while any command buffer that uses them is in the recording state, but it permits them to be destroyed otherwise. This means that applications are allowed to free pipeline layouts after command recording is finished even if there are pipeline objects that still exist and were created with these layouts. There are two solutions to this, one is to use reference counting on pipeline layout objects. The other is to avoid holding references to pipeline layouts where they are not really needed. This patch takes a step towards the second option by making the pipeline shader compile code take pipeline layout from the VkGraphicsPipelineCreateInfo provided rather than the pipeline object. A follow-up patch will remove any remaining uses of the layout field so we can remove it from the pipeline object and avoid the need for reference counting. v2: Use ANV_FROM_HANDLE, remove unnecessary braces (Jason) Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:46 +01:00
Iago Toral Quiroga	14f6275c92	anv/descriptor_set: add reference counting for descriptor set layouts The spec states that descriptor set layouts can be destroyed almost at any time: "VkDescriptorSetLayout objects may be accessed by commands that operate on descriptor sets allocated using that layout, and those descriptor sets must not be updated with vkUpdateDescriptorSets after the descriptor set layout has been destroyed. Otherwise, descriptor set layouts can be destroyed any time they are not in use by an API command." v2: allocate off the device allocator with DEVICE scope (Jason) Fixes the following work-in-progress CTS tests: dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.compute Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:46 +01:00
Jason Ekstrand	c8949e2498	anv/pipeline: Don't look at blend state unless we have an attachment Without this, we may end up dereferencing blend before we check for binding->index != UINT32_MAX. However, Vulkan allows the blend state to be NULL so long as you don't have any color attachments. This fixes a segfault when running The Talos Principal. Fixes: `12f4e00b69` Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-01-26 01:44:45 -08:00
Maxin B. John	8116b9170b	anv_icd.py: improve reproducible builds Sort the output to ensure build reproducibility Signed-off-by: Maxin B. John <maxin.john@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Fixes: `0ab04ba979` ("anv: Use python to generate ICD json files") Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 01:37:45 -08:00
Jason Ekstrand	db682b8f0e	i965/fs: Reset the register file to VGRF in lower_integer_multiplication `18fde36ced` changed the way temporary registers were allocated in lower_integer_multiplication so that we allocate regs_written(inst) space and keep the stride of the original destination register. This was to ensure that any MUL which originally followed the CHV/BXT integer multiply regioning restrictions would continue to follow those restrictions even after lowering. This works fine except that I forgot to reset the register file to VGRF so, even though they were assigned a number from alloc.allocate(), they had the wrong register file. This caused some GLES 3.0 CTS tests to start failing on Sandy Bridge due to attempted reads from the MRF: ES3-CTS.functional.shaders.precision.int.highp_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.int.mediump_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.int.lowp_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.uint.highp_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.uint.mediump_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.uint.lowp_mul_fragment.snbm64 This commit remedies this problem by, instead of copying inst->dst and overwriting nr, just make a new register and set the region to match inst->dst. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103626 Fixes: `18fde36ced` Cc: "17.3" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-01-25 13:58:55 -08:00
Emil Velikov	50265cd9ee	automake: anv: ship anv_extensions_gen.py in the tarball Fixes: `dd088d4bec` ("anv/extensions: Generate a header file with extension tables") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-01-25 17:06:29 +00:00
Jason Ekstrand	7d4007d58a	aubinator: Multiply count by 4 to compute buffer sizes The count field is in terms of dwords and not bytes.	2018-01-24 19:05:36 -08:00
Grazvydas Ignotas	0cc7370733	anv: correct a duplicate check in an assert Looks like checking both sources was intended, instead of the first one twice. Found with Coccinelle, coccinellery/xand/xand.cocci semantic patch. Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-01-25 01:10:45 +02:00
Jason Ekstrand	4064fe59e7	anv/cmd_buffer: Move gen7 index buffer state to graphics state Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:46 -08:00
Jason Ekstrand	38ec78049f	anv/cmd_buffer: Move num_workgroups to compute state While we're here, make it an anv_address. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:44 -08:00
Jason Ekstrand	95ff232294	anv/cmd_buffer: Move dynamic state to graphics state Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:43 -08:00
Jason Ekstrand	24caee8975	anv/cmd_buffer: Use a temporary variable for dynamic state We were already doing this for some packets to keep the lines shorter. We may as well just do it for all of them. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:40 -08:00
Jason Ekstrand	8bd5ec5b86	anv/cmd_buffer: Move vb_dirty bits into anv_cmd_graphics_state Vertex buffers are entirely a graphics pipeline thing. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:39 -08:00
Jason Ekstrand	e85aaec148	anv/cmd_buffer: Move dirty bits into anv_cmd_*_state Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:36 -08:00
Jason Ekstrand	97f96610c8	anv: Separate compute and graphics descriptor sets The Vulkan spec says: "pipelineBindPoint is a VkPipelineBindPoint indicating whether the descriptors will be used by graphics pipelines or compute pipelines. There is a separate set of bind points for each of graphics and compute, so binding one does not disturb the other." Up until now, we've been ignoring the pipeline bind point and had just one bind point for everything. This commit separates things out into separate bind points. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102897 Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:33 -08:00
Jason Ekstrand	31b2144c83	anv/cmd_buffer: Use anv_descriptor_for_binding for samplers Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:31 -08:00
Jason Ekstrand	b9e1ca16f8	anv/cmd_buffer: Add a helper for binding descriptor sets This lets us unify some code between push descriptors and regular descriptors. It doesn't do much for us yet but it will. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:30 -08:00
Jason Ekstrand	90cceaa9dd	anv/cmd_buffer: Refactor ensure_push_descriptor_set It's now a function which returns the push descriptor set. Since we set the error on the command buffer, returning the error is a little redundant. Returning the descriptor set (or NULL on error) is more convenient. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:28 -08:00
Jason Ekstrand	d5592e2fda	anv: Remove semicolons from vk_error[f] definitions With the semicolons, they can't be used in a function argument without throwing syntax errors. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:27 -08:00
Jason Ekstrand	9af5379228	anv/cmd_buffer: Add substructs to anv_cmd_state for graphics and compute Initially, these just contain the pipeline in a base struct. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:25 -08:00
Jason Ekstrand	ddc2d28548	anv/cmd_buffer: Use some pre-existing pipeline temporaries There are several places where we'd already saved the pipeline off to a temporary variable but, due to an artifact of history, weren't actually using that temporary everywhere. No functional change. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:24 -08:00
Jason Ekstrand	cd3feea745	anv/cmd_buffer: Rework anv_cmd_state_reset This splits anv_cmd_state_reset into separate init and finish functions. This lets us share init code with cmd_buffer_create. This potentially fixes subtle bugs where we may have missed some bit of state that needs to get initialized on command buffer creation. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:22 -08:00
Jason Ekstrand	d6c9a89d13	anv/cmd_buffer: Get rid of the meta query workaround Meta has been gone for a long time. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:20 -08:00
Jason Ekstrand	bc0a21e348	anv/cmd_state: Drop the scratch_size field This is a legacy left-over from the mechanism we used to use to handle scratch. The new (and better) mechanism doesn't use this. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:19 -08:00
Jason Ekstrand	4b69ba3817	anv/pipeline: Don't assert on more than 32 samplers This prevents an assert when running one unreleased Vulkan game. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:08 -08:00
Jason Ekstrand	de00e8227b	anv: Return trampoline entrypoints from GetInstanceProcAddr Technically, the Vulkan spec requires that we return valid entrypoints for all core functionality and any available device extensions. This means that, for gen-specific functions, we need to return a trampoline which looks at the device and calls the right device function. In 99% of cases, the loader will do this for us but, aparently, we're supposed to do it too. It's a tiny increase in binary size for us to carry this around but really not bad. Before: text data bss dec hex filename 3541775 204112 6136 3752023 394057 libvulkan_intel.so After: text data bss dec hex filename 3551463 205632 6136 3763231 396c1f libvulkan_intel.so Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	eac29f3a6d	anv/entrypoints: Use an named tuple for params This allows us to store a bit more detailed data per-param Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	1f79d986af	anv: Only advertise enabled entrypoints The Vulkan spec annoyingly requires us to track what core version and what all extensions are enabled and only advertise those entrypoints. Any call to vkGet*ProcAddr for an entrypoint for an extension the client has not explicitly enabled is supposed to return NULL. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	e3d27542ae	anv: Add a per-device dispatch table We also switch GetDeviceProcAddr over to use it. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	0c399dca51	anv: Add a per-instance dispatch table We also switch GetInstanceProcAddr over to use it. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	a372b9247d	anv: Properly NULL for GetInstanceProcAddr with a null instance Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	cb0d1ba156	anv/extensions: Fix VkVersion::c_vk_version for patch == None Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	93e789a266	anv/entrypoints: Parse entrypoints before extensions/features Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	2f493121ae	anv/entrypoints: Expose the different dispatch tables Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	083e126694	anv/entrypoints: Split entrypoint index lookup into its own function Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	7039308d7c	anv/entrypoints: Add a LAYERS helper variable Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	f54227856f	anv/entrypoints: Add an Entrypoint class Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	abc62282b5	anv: Add a per-device table of enabled extensions Nothing uses this at the moment, but we will need it soon. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	01b9701a5c	anv: Use tables for device extension wrangling Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	920bd2c0bc	anv: Add a per-instance table of enabled extensions Nothing needs this yet but we will want it later. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	ff5f3e2b21	anv: Use tables for instance extension wrangling This lets us move a bunch of stuff out of codegen and back into anv_device.c which is a bit nicer. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	dd088d4bec	anv/extensions: Generate a header file with extension tables This allows us better introspection into extensions. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	ffb10bfd8e	anv/meson: Simplify some dependency and flag tracking This removes some redundant code between libanv_common, libvulkan_intel, and libvulkan_intel_test. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	f939940809	anv: Split anv_extensions.py into two files The new anv_extensions_gen.py is the code generator while the old anv_extensions.py file is purely declarative. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	10d1b0be8e	anv/meson: Make anv_entrypoints_gen.py depend on anv_extensions.py Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Kenneth Graunke	60f15477da	i965: Drop render_target_start from binding table struct. We have to start render targets at binding table index 0 in order to use headerless FB write messages, and in fact already assume this in a bunch of places in the code. Let's finish that off, and not bother storing 0 in a struct to pretend to add it in a few places. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-01-22 10:03:52 -08:00
Dylan Baker	436ed65d38	autotools: include meson build files in tarball This adds the meson.build, meson_options.txt, and a few scripts that are used exclusively by the meson build. v2: - Remove accidentally included changes needed to test make dist with LLVM > 3.9 Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Acked-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-01-19 16:30:51 -08:00
Samuel Iglesias Gonsálvez	7109a1fe13	anv: avoid segmentation fault due to vk_error() vk_error() is a macro that calls __vk_errorf() with instance == NULL. Then, __vk_errorf() passes a pointer to instance->debug_report_callbacks to vk_debug_error(), which segfaults as this pointer is invalid but not NULL. Fixes: `e5b1bd6ab8` "vulkan: move anv VK_EXT_debug_report implementation to common code." Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-01-19 09:39:05 +01:00
Chris Wilson	34499e8ddc	intel: Future-proof ring names for aubinator_error_decode The kernel is moving to a $class$instance naming scheme in preparation for accommodating more rings in the future in a consistent manner. It is already using the naming scheme internally, and now we are looking at updating some soft-ABI such as the error state to use the new naming scheme. This of course means we need to teach aubinator_error_decode how to map both sets of ring names onto its register maps. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2018-01-18 17:35:21 +00:00
Iago Toral Quiroga	7ec6e4e689	anv/query: implement multiview interactions From the Vulkan spec with KHX extensions: "If queries are used while executing a render pass instance that has multiview enabled, the query uses N consecutive query indices in the query pool (starting at query) where N is the number of bits set in the view mask in the subpass the query is used in. How the numerical results of the query are distributed among the queries is implementation-dependent. For example, some implementations may write each view's results to a distinct query, while other implementations may write the total result to the first query and write zero to the other queries. However, the sum of the results in all the queries must accurately reflect the total result of the query summed over all views. Applications can sum the results from all the queries to compute the total result." In our case we only really emit a single query (in the first query index) that stores the aggregated result for all views, but we still need to manage availability for all the other query indices involved, even if we don't actually use them. This is relevant when clients call vkGetQueryPoolResults and pass all N queries to retrieve the results. In that scenario, without this patch, we will never see queries other than the first being available since we never emit them. v2: we need the same treatment for timestamp queries. v3 (Jason): - Better an if instead of an early return. - We can't write to this memory in the CPU, we should use MI_STORE_DATA_IMM and emit_query_availability (Jason). v4 (Jason): - No need to take the value to write as parameter, just hard code it to 0. Fixes test failures in some work-in-progress CTS multiview+query tests. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-18 16:37:06 +01:00
Samuel Iglesias Gonsálvez	eac629deb6	anv: return VK_ERROR_OUT_OF_DEVICE_MEMORY when surface size is out of HW limits Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-18 06:48:47 +01:00
Francisco Jerez	11674dad8a	intel/fs: Optimize and simplify the copy propagation dataflow logic. Previously the dataflow propagation algorithm would calculate the ACP live-in and -out sets in a two-pass fixed-point algorithm. The first pass would update the live-out sets of all basic blocks of the program based on their live-in sets, while the second pass would update the live-in sets based on the live-out sets. This is incredibly inefficient in the typical case where the CFG of the program is approximately acyclic, because it can take up to 2*n passes for an ACP entry introduced at the top of the program to reach the bottom (where n is the number of basic blocks in the program), until which point the algorithm won't be able to reach a fixed point. The same effect can be achieved in a single pass by computing the live-in and -out sets in lock-step, because that makes sure that processing of any basic block will pick up the updated live-out sets of the lexically preceding blocks. This gives the dataflow propagation algorithm effectively O(n) run-time instead of O(n^2) in the acyclic case. The time spent in dataflow propagation is reduced by 30x in the GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP test-case on my CHV system (the improvement is likely to be of the same order of magnitude on other platforms). This more than reverses an apparent run-time regression in this test-case from my previous copy-propagation undefined-value handling patch, which was ultimately caused by the additional work introduced in that commit to account for undefined values being multiplied by a huge quadratic factor. According to Chad this test was failing on CHV due to a 30s time-out imposed by the Android CTS (this was the case regardless of my undefined-value handling patch, even though my patch substantially exacerbated the issue). On my CHV system this patch reduces the overall run-time of the test by approximately 12x, getting us to around 13s, well below the time-out. v2: Initialize live-out set to the universal set to avoid rather pessimistic dataflow estimation in shaders with cycles (Addresses performance regression reported by Eero in GpuTest Piano). Performance numbers given above still apply. No shader-db changes with respect to master. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271 Reported-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-01-17 11:56:08 -08:00
Bas Nieuwenhuizen	e5b1bd6ab8	vulkan: move anv VK_EXT_debug_report implementation to common code. For also using it in radv. I moved the remaining stubs back to anv_device.c as they were just trivial. This does not move the vk_errorf/anv_perf_warn or the object type macros, as those depend on anv types and logging. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-01-17 11:27:52 +01:00
Samuel Iglesias Gonsálvez	e63adf8b1e	anv: VkDescriptorSetLayoutBinding can have descriptorCount == 0 From Vulkan spec: "descriptorCount is the number of descriptors contained in the binding, accessed in a shader as an array. If descriptorCount is zero this binding entry is reserved and the resource must not be accessed from any stage via this binding within any pipeline using the set layout." Fixes: dEQP-VK.binding_model.descriptor_update.empty_descriptor.uniform_buffer Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2018-01-12 07:08:51 +01:00
Dylan Baker	2083a14179	meson: Use dependencies for nir This creates two new internal dependencies, idep_nir_headers and idep_nir. The former encapsulates the generation of nir_opcodes.h and nir_builder_opcodes.h and adding src/compiler/nir as an include path. This ensures that any target that needs nir headers will have the includes and that the generated headers will be generated before the target is build. The second, idep_nir, includes the first and additionally links to libnir. This is intended to make it easier to avoid race conditions in the build when using nir, since the number of consumers for libnir and it's headers are quite high. Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>	2018-01-11 15:40:02 -08:00
Dylan Baker	60856a7b49	meson: don't use intermediate variables that are immediately discarded For things like: loop x = func() list += x end just do: loop list += func() end Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>	2018-01-11 15:40:02 -08:00
Dylan Baker	4ccb981673	meson: Use consistent style for tests Don't use intermediate variables, use consistent whitespace. Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>	2018-01-11 15:40:02 -08:00
Dylan Baker	fbf192a67e	meson: Use consistent style Currently the meosn build has a mix of two styles: arg : [foo, ... bar], and arg : [ foo, ..., bar, ] For consistency let's pick one. I've picked the later style, which I think is more readable, and is more common in the mesa code base. v2: - fix commit message Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>	2018-01-11 15:40:02 -08:00
Jason Ekstrand	c3d802d68e	i965: Use UD types for gl_SampleID setup We already had to switch all of the W types to UW to prevent issues with vector immediates on gen10. We may as well use unsigned types everywhere. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-01-11 14:31:47 -08:00
Jason Ekstrand	3d2b157e23	i965/fs: Use UW types when using V immediates Gen 10 has a strange hardware bug involving V immediates with W types. It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2 getting the value {3, 2, 1, 0, 3, 2, 1, 0}. In particular, the bottom four nibbles are repeated instead of the top four being taken. (A mov of 0x00003210V yields the same result.) This bug does not appear in any hardware documentation as far as we can tell and the simulator does not implement the bug either. Commit `6132992cdb` was mostly a no-op except that it changed the type of the subgroup invocation from UW to W and caused us to tickle this bug with basically every compute shader that uses any sort of invocation ID (which is most of them). This is also potentially an issue for geometry shader input pulls and SampleID setup. The easy solution is just to change the few places where we use a vector integer immediate with a W type to use a UW type. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org Fixes: `6132992cdb`	2018-01-11 14:31:38 -08:00
Matt Turner	c0ef14f5b1	Revert "Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+"" This reverts commit `2d04572038`. Acked-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-01-11 10:11:59 -08:00
Matt Turner	01ebfbb67a	i965/fs: Add/use functions to convert to 3src_align1 vstride/hstride Some cases weren't handled, such as stride 4 which is needed for 64-bit operations. Presumably fixes the assertion failure mentioned in commit `2d04572038` (Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+") but who can really say since the commit neglected to list any of them! Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-01-11 10:11:59 -08:00
Alex Smith	4fd85617c1	anv: Make sure state on primary is correct after CmdExecuteCommands After executing a secondary command buffer, we need to update certain state on the primary command buffer to reflect changes by the secondary. Otherwise subsequent commands may not have the correct state set. This fixes various issues (rendering errors, GPU hangs) seen after executing secondary command buffers in some cases. v2 (Jason Ekstrand): - Reset to invalid values instead of pulling from the secondary - Change the comment to be more descriptive Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2018-01-11 18:11:08 +00:00
Andres Gomez	a1901d092c	anv: Import mako templates only during execution of anv_extensions anv_extensions usage from anv_icd was bringing the unwanted dependency of mako templates for the latter. We don't want that since it will force the dependency even for distributable tarballs which was not needed until now. Jason suggested this approach. v2: Patch simplification (Jason). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551 Fixes: `0ab04ba979` ("anv: Use python to generate ICD json files") Cc: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-11 14:44:03 +02:00
Samuel Iglesias Gonsálvez	c0816389c2	anv: fix maxDescriptorSet* limits "The maxDescriptorSet* limit is n times the corresponding maxPerStageDescriptor* limit, where n is the number of shader stages supported by the VkPhysicalDevice. If all shader stages are supported, n = 6 (vertex, tessellation control, tessellation evaluation, geometry, fragment, compute)." Fixes: dEQP-VK.api.info.device.properties Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-11 07:00:42 +01:00
Iago Toral Quiroga	4317c848b9	i965/nir: add a helper to lower gl_PatchVerticesIn to a uniform v2: do not try to handle it as a system value directly for the SPIR-V path. In GL we rather handle it as a uniform like we do for the GLSL path (Jason). v3: - Remove the uniform variable, it is alwats -1 now (Jason) - Also do the lowering for the TessEval stage (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-10 08:21:02 +01:00
Kenneth Graunke	28c2d0d80b	genxml: Add missing INSTDONE_1 bits on Gen7.5+. This will make aubinator_error_decode decode them properly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-01-09 10:13:53 -08:00
Kenneth Graunke	8eadc2fb8f	intel: Apply Geminilake "Barrier Mode" workaround. Apparently, Geminilake requires you to whack a chicken bit to select either compute or tessellation mode for barriers. The recommendation is to switch between them at PIPELINE_SELECT time. We may not need to do this all the time, but I don't know that it hurts either. PIPELINE_SELECT is already a pretty giant stall. This appears to fix hangs in tessellation control shaders with barriers on Geminilake. Note that this requires a corresponding kernel change, drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. in order for the register write to actually happen. Without an updated kernel, this register write will be noop'd and the fix will not work. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-01-09 10:13:33 -08:00
Scott D Phillips	42f421cbbf	aubinator: add support for aubinating memtrace aubs Memtrace aubs are similar to classic aubs, with the major difference being how command submission is serialized (as register writes instead of a high-level submit message). Some internal tools generate or consume only memtrace aubs. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Scott D Phillips	8cdf5bd292	aubinator: extract aubinator_init() out of the header handler function A later patch will use the aubinator_init() function from the memtrace aub header handler. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Scott D Phillips	4f0a2ff4c1	aubinator: honor --color option when printing the header Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Alex Smith	0d8b9c529c	anv: Allow PMA optimization to be enabled in secondary command buffers This was never enabled in secondary buffers because hiz_enabled was never set to true for those. If the app provides a framebuffer in the inheritance info when beginning a secondary buffer, we can determine if HiZ is enabled and therefore allow the PMA optimization to be enabled within the command buffer. This improves performance by ~13% on an internal benchmark on Skylake. v2: Use anv_cmd_buffer_get_depth_stencil_view(). Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-08 09:31:17 +00:00
Alex Smith	12f4e00b69	anv: Take write mask into account in has_color_buffer_write_enabled If we have a color attachment, but its writes are masked, this would have still returned true. This is inconsistent with how HasWriteableRT in 3DSTATE_PS_BLEND is set, which does take the mask into account. This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA if the fragment shader does use UAVs, meaning the fragment shader may not be invoked because HasWriteableRT is false. Specifically, this was seen to occur when the shader also enables early fragment tests: the fragment shader was not invoked despite passing depth/stencil. Fix by taking the color write mask into account in this function. This is consistent with how things are done on i965. Signed-off-by: Alex Smith <asmith@feralinteractive.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-01-05 15:36:22 +00:00
Alex Smith	00a81e9909	anv: Add missing unlock in anv_scratch_pool_alloc Fixes hangs seen due to the lock not being released here. Signed-off-by: Alex Smith <asmith@feralinteractive.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-04 14:54:02 +00:00
Kenneth Graunke	74e1d6e20c	i965: Drop support for the legacy SNORM -> Float equation. Older OpenGL defines two equations for converting from signed-normalized to floating point data. These are: f = (2c + 1)/(2^b - 1) (equation 2.2) f = max{c/2^(b-1) - 1), -1.0} (equation 2.3) Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be used in all scenarios, and remove equation 2.2. DirectX uses equation 2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+ systems that use the vertex fetcher hardware to do the conversions always get formula 2.3. This can make a big difference for 10-10-10-2 formats - the 2-bit value can represent 0 with equation 2.3, and cannot with equation 2.2. Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES. Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use the new rules, at least in core profile. That would leave Gen4-6 doing something different than all other hardware, which seems...lame. With context version promotion, applications that requested a pre-4.2 context may get promoted to 4.2, and thus get the new rules. Zero cases have been reported of this being a problem. However, we've received a report that following the old rules breaks expectations. SuperTuxKart apparently renders the cars red when following equation 2.2, and works correctly when following equation 2.3: https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405 So, this patch deletes the legacy equation 2.2 support entirely, making all hardware and APIs consistently use the new equation 2.3 rules. If we ever find an application that truly requires the old formula, then we'd likely want that application to work on modern hardware, too. We'd likely restore this support as a driconf option. Until then, drop it. This commit will regress Piglit's draw-vertices-2101010 test on pre-Haswell without the corresponding Piglit patch to accept either formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1): draw-vertices-2101010: Accept either SNORM conversion formula. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisforbes@google.com>	2018-01-02 16:51:42 -08:00
Kenneth Graunke	a1afef8de0	i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes. These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-30 20:30:34 -08:00
Jason Ekstrand	967d238c69	anv/device: Mark all state buffers as needing capture Previously, we were flagging the instruction state buffer for capture but not surface state or dynamic state. We want those captured too. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Jason Ekstrand	69fa3fb77f	intel/aubinator: Gracefully handle dynamic state not being available Some older versions of the Vulkan driver didn't properly tag dynamic state as needing to be captured. Also, this prevents crashes when looking at dumps on older kernels. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Jason Ekstrand	a92d52c3c1	intel/aubinator: Free section data last We were walking the sections, printing the batches, and then freeing them in one pass. If the batch happens to reference any earlier sections (which it almost certainly will since it's at the end), we will access freed memory. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Anuj Phogat	2d04572038	Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+" This reverts commit `9cd60fce9c`. Above commit caused 2000+ piglit tests to assert fail. Disabling the align1 mode on gen10 for now to avoid failures. Cc: Matt Turner <mattst88@gmail.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-12-22 16:40:40 -08:00
Francisco Jerez	b3e3cb9901	intel/fs: Initialize fs_visitor::grf_used on construction. This should shut up some Valgrind errors during pre-regalloc scheduling. The errors were harmless since they could only have led to the estimation of the bank conflict penalty of an instruction pre-regalloc, which is inaccurate at that point of the program compilation, but no less accurate than the intended "return 0" fall-back path. The scheduling pass is normally re-run after regalloc with a well-defined grf_used value and accurate bank conflict information. Fixes: `acf98ff933` "intel/fs: Teach instruction scheduler about GRF bank conflict cycles." Reported-by: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-12-21 15:20:17 -08:00
Francisco Jerez	1aa79d5ed5	intel/fs/bank_conflicts: Use posix_memalign() instead of overaligned new to obtain vector storage. The weight_vector_type constructor was inadvertently assuming C++17 semantics of the new operator applied on a type with alignment requirement greater than the largest fundamental alignment. Unfortunately on earlier C++ dialects the implementation was allowed to raise an allocation failure when the alignment requirement of the allocated type was unsupported, in an implementation-defined fashion. It's expected that a C++ implementation recent enough to implement P0035R4 would have honored allocation requests for such over-aligned types even if the C++17 dialect wasn't active, which is likely the reason why this problem wasn't caught by our CI system. A more elegant fix would involve wrapping the __SSE2__ block in a '__cpp_aligned_new >= 201606' preprocessor conditional and continue taking advantage of the language feature, but that would yield lower compile-time performance on old compilers not implementing it (e.g. GCC versions older than 7.0). Fixes: `af2c320190` "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226 Reported-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-12-21 15:19:59 -08:00
Samuel Iglesias Gonsálvez	a31f0c4a36	anv: disallow VK_REMAINING_ARRAY_LAYERS in vkCmdClearAttachments() Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed in the passed VkClearRect struct. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-20 06:55:41 +01:00
Rafael Antognolli	85789831b4	intel/compiler/gen10: Disable push constants. We still have gpu hangs on Cannonlake when using push constants, so disable them for now until we have a proper fix for these hangs. v2: Add warning message when creating context too. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2017-12-19 12:32:24 -08:00
Bas Nieuwenhuizen	6d9849d63e	anv: Remove unused variable. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-17 14:53:46 +01:00
Kenneth Graunke	02720f8d24	isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell. According to the RENDER_SURFACE_STATE internal documentation, the R32G32B32_FLOAT restriction is marked "IVB" only. We choose to apply it to Ivybridge and Baytrail, but not Haswell. Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell. Changes these tests from crashing to skipping on Haswell: - KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f - KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-15 14:00:09 -08:00
Jason Ekstrand	4b8c9ea46b	intel/tools: Convert aubinator over to the common framework Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:24 -08:00
Jason Ekstrand	35f9c27be3	intel/batch-decoder: Decode registers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:22 -08:00
Jason Ekstrand	81e4ecbc19	intel/batch-decoder: Decode dynamic state Unfortunately, in aubinator and aubinator_error_decode we don't always know how many of a given state we have, so we must guess. One day, we'll come up with a way to annotate the batch to solve this problem. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:20 -08:00
Jason Ekstrand	4ac2ee9001	intel/batch-decoder: Decode constants, binding tables, and samplers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:18 -08:00
Jason Ekstrand	d374423eab	intel/tools: Switch aubinator_error_decode over to the gen_print_batch The shared framework can now do everything that aubinator_error_decode ever did and more. It's time to make the switch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:16 -08:00
Jason Ekstrand	c86671c438	intel/batch-decoder: Decode graphics shaders Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:15 -08:00
Jason Ekstrand	d4081fb778	intel/batch-decoder: Decode vertex and index buffers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:13 -08:00
Jason Ekstrand	e27ec208ed	intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:12 -08:00
Jason Ekstrand	be20043d00	intel/tools: Add the start of a generic batch decoder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:10 -08:00
Jason Ekstrand	4cb96fbd91	intel/decoder: Expose the raw field value in the iterator Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:09 -08:00
Jason Ekstrand	79269e8f4b	intel/disasm: Take a devinfo in gen_disasm_create Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:06 -08:00
Jason Ekstrand	a7ae72032f	intel/decoder: Take a bit offset in gen_print_group Previously, if a group was nested in another group such that it didn't start on a dword boundary, we would decode it as if it started at the start of its first dword. This changes things to work even more in terms of bits so that we can properly decode these structs. This affects MOCS, attribute swizzles, and several other things. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:04 -08:00
Jason Ekstrand	dca8f466ee	intel/decoder: Stop rounding down to the nearest dword Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:03 -08:00
Jason Ekstrand	f264640693	intel/decoder: Convert the iterator to work entirely in bits Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:01 -08:00
Jason Ekstrand	ada705b671	intel/decoder: Drop gen_field_decode helper It's unused Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:26:44 -08:00
Francisco Jerez	acab52f520	intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers. Fixes: `af2c320190` "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199 Reported-by: Darius Spitznagel <d.spitznagel@goodbytez.de> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-12-12 12:05:45 -08:00
Samuel Iglesias Gonsálvez	ba4bb0838b	anv: fix bug when using component qualifier in FS outputs We can write to the same output but in different components, like in this example: layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0; layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1; Therefore, they are not two different outputs but only one. Fixes: dEQP-VK.glsl.440.linkage.varying.component.frag_out.* v3: - Remove FRAG_RESULT_MAX. - Add const and use sizeof (Ian). - Do three-pass to set properly the locations of fragment outputs when having arrays (Jason). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-12 07:24:55 +01:00
Jason Ekstrand	4c7af87fb9	anv: Enable UBO pushing Push constants on Intel hardware are significantly more performant than pull constants. Since most Vulkan applications don't actively use push constants on Vulkan or at least don't use it heavily, we're pulling way more than we should be. By enabling pushing chunks of UBOs we can get rid of a lot of those pulls. On my SKL GT4e, this improves the performance of Dota 2 and Talos by around 2.5% and improves Aztec Ruins by around 2%. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:26 -08:00
Jason Ekstrand	f1ce0b905a	i965/fs: Handle !supports_pull_constants and push UBOs properly In Vulkan, we don't support classic pull constants and everything the client asks us to push, we push. However, for pushed UBOs, we still want to fall back to conventional pulls if we run out of space.	2017-12-08 15:43:25 -08:00
Jason Ekstrand	8d34077182	anv/device: Increase the UBO alignment requirement to 32 Push constants work in terms of 32-byte chunks so if we want to be able to push UBOs, every thing needs to be 32-byte aligned. Currently, we only require 16-byte which is too small. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	2f9eb045f3	anv/cmd_buffer: Add support for pushing UBO ranges In order to do this we have to modify push constant set up to handle ranges. We also have to tweak the way we handle dirty bits a bit so that we re-push whenever a descriptor set changes. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	0c879b62b0	anv/cmd_buffer: Add some stage asserts There are several places where we look up opcodes in an array of stages. Assert that the we don't end up going out-of-bounds. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	1968cd07a2	anv/cmd_buffer: Add some helpers for working with descriptor sets Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	1bce04deb8	anv/pipeline: Translate vulkan_resource_index to a constant when possible We want to call brw_nir_analyze_ubo_ranges immedately after anv_nir_apply_pipeline_layout and it badly wants constants. We could run an optimization step and let constant folding do it but that's way more expensive than needed. It's really easy to just handle constants in apply_pipeline_layout. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	3b34ed79f1	i965/fs: Rewrite assign_constant_locations This rewires the logic for assigning uniform locations to work in terms of "complex alignments". The basic idea is that, as we walk the list of instructions, we keep track of the alignment and continuity requirements of each slot and assert that the alignments all match up. We then use those alignments in the compaction stage to ensure that everything gets placed at a properly aligned register. The old mechanism handled alignments by special-casing each of the bit sizes and placing 64-bit values first followed by 32-bit values. The old scheme had the advantage of never leaving a hole since all the 64-bit values could be tightly packed and so could the 32-bit values. However, the new scheme has no type size special cases so it handles not only 32 and 64-bit types but should gracefully extend to 16 and 8-bit types as the need arises. Tested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	597c194487	anv: Disable VK_KHR_16bit_storage The testing for this extension is currently very poor. The CTS tests only test accessing UBOs and SSBOs at dynamic offsets so none of our constant-offset paths get triggered at all. Also, there's an assertion in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which is never triggered indicating that nothing every gets loaded from an offset which is not a dword. Both push constants and the constant offset pull paths are complex enough, we really don't want to ship without tests. We'll turn the extension back on once we have decent tests.	2017-12-08 15:42:55 -08:00
Francisco Jerez	4d1959e693	intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution. This addresses a long-standing back-end compiler bug that could lead to cross-channel data corruption in loops executed non-uniformly. In some cases live variables extending through a loop divergence point (e.g. a non-uniform break) into a convergence point (e.g. the end of the loop) wouldn't be considered live along all physical control flow paths the SIMD thread could possibly have taken in between due to some channels remaining in the loop for additional iterations. This patch fixes the problem by extending the CFG with physical edges that don't exist in the idealized non-vectorized program, but represent valid control flow paths the SIMD EU may take due to the divergence of logical threads. This makes sense because the i965 IR is explicitly SIMD, and it's not uncommon for instructions to have an influence on neighboring channels (e.g. a force_writemask_all header setup), so the behavior of the SIMD thread as a whole needs to be considered. No changes in shader-db. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-12-07 18:27:05 -08:00
Francisco Jerez	9355116bda	intel/fs: Don't let undefined values prevent copy propagation. This makes the dataflow propagation logic of the copy propagation pass more intelligent in cases where the destination of a copy is known to be undefined for some incoming CFG edges, building upon the definedness information provided by the last patch. Helps a few programs, and avoids a handful shader-db regressions from the next patch. shader-db results on ILK: total instructions in shared programs: 6541547 -> 6541523 (-0.00%) instructions in affected programs: 360 -> 336 (-6.67%) helped: 8 HURT: 0 LOST: 0 GAINED: 10 shader-db results on BDW: total instructions in shared programs: 8174323 -> 8173882 (-0.01%) instructions in affected programs: 7730 -> 7289 (-5.71%) helped: 5 HURT: 2 LOST: 0 GAINED: 4 shader-db results on SKL: total instructions in shared programs: 8185669 -> 8184598 (-0.01%) instructions in affected programs: 10364 -> 9293 (-10.33%) helped: 5 HURT: 2 LOST: 0 GAINED: 2 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-07 18:27:04 -08:00
Francisco Jerez	c3c1aa5aeb	intel/fs: Restrict live intervals to the subset possibly reachable from any definition. Currently the liveness analysis pass would extend a live interval up to the top of the program when no unconditional and complete definition of the variable is found that dominates all of its uses. This can lead to a serious performance problem in shaders containing many partial writes, like scalar arithmetic, FP64 and soon FP16 operations. The number of oversize live intervals in such workloads can cause the compilation time of the shader to explode because of the worse than quadratic behavior of the register allocator and scheduler when running out of registers, and it can also cause the running time of the shader to explode due to the amount of spilling it leads to, which is orders of magnitude slower than GRF memory. This patch fixes it by computing the intersection of our current live intervals with the subset of the program that can possibly be reached from any definition of the variable. Extending the storage allocation of the variable beyond that is pretty useless because its value is guaranteed to be undefined at a point that cannot be reached from any definition. According to Jason, this improves performance of the subgroup Vulkan CTS tests significantly (e.g. the runtime of the dvec4 broadcast test improves by nearly 50x). No significant change in the running time of shader-db (with 5% statistical significance). shader-db results on IVB: total cycles in shared programs: 61108780 -> 60932856 (-0.29%) cycles in affected programs: 16335482 -> 16159558 (-1.08%) helped: 5121 HURT: 4347 total spills in shared programs: 1309 -> 1288 (-1.60%) spills in affected programs: 249 -> 228 (-8.43%) helped: 3 HURT: 0 total fills in shared programs: 1652 -> 1597 (-3.33%) fills in affected programs: 262 -> 207 (-20.99%) helped: 4 HURT: 0 LOST: 2 GAINED: 209 shader-db results on BDW: total cycles in shared programs: 67617262 -> 67361220 (-0.38%) cycles in affected programs: 23397142 -> 23141100 (-1.09%) helped: 8045 HURT: 6488 total spills in shared programs: 1456 -> 1252 (-14.01%) spills in affected programs: 465 -> 261 (-43.87%) helped: 3 HURT: 0 total fills in shared programs: 1720 -> 1465 (-14.83%) fills in affected programs: 471 -> 216 (-54.14%) helped: 4 HURT: 0 LOST: 2 GAINED: 162 shader-db results on SKL: total cycles in shared programs: 65436248 -> 65245186 (-0.29%) cycles in affected programs: 22560936 -> 22369874 (-0.85%) helped: 8457 HURT: 6247 total spills in shared programs: 437 -> 437 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 870 -> 854 (-1.84%) fills in affected programs: 16 -> 0 helped: 1 HURT: 0 LOST: 0 GAINED: 107 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-07 18:27:04 -08:00
Francisco Jerez	acf98ff933	intel/fs: Teach instruction scheduler about GRF bank conflict cycles. This should allow the post-RA scheduler to do a slightly better job at hiding latency in presence of instructions incurring bank conflicts. The main purpuse of this patch is not to improve performance though, but to get conflict cycles to show up in shader-db statistics in order to make sure that regressions in the bank conflict mitigation pass don't go unnoticed. Acked-by: Matt Turner <mattst88@gmail.com>	2017-12-07 15:56:49 -08:00
Francisco Jerez	af2c320190	intel/fs: Implement GRF bank conflict mitigation pass. Unnecessary GRF bank conflicts increase the issue time of ternary instructions (the overwhelmingly most common of which is MAD) by roughly 50%, leading to reduced ALU throughput. This pass attempts to minimize the number of bank conflicts by rearranging the layout of the GRF space post-register allocation. It's in general not possible to eliminate all of them without introducing extra copies, which are typically more expensive than the bank conflict itself. In a shader-db run on SKL this helps roughly 46k shaders: total conflicts in shared programs: 1008981 -> 600461 (-40.49%) conflicts in affected programs: 816222 -> 407702 (-50.05%) helped: 46234 HURT: 72 The running time of shader-db itself on SKL seems to be increased by roughly 2.52%±1.13% with n=20 due to the additional work done by the compiler back-end. On earlier generations the pass is somewhat less effective in relative terms because the hardware incurs a bank conflict anytime the last two sources of the instruction are duplicate (e.g. while trying to square a value using MAD), which is impossible to avoid without introducing copies. E.g. for a shader-db run on SNB: total conflicts in shared programs: 944636 -> 623185 (-34.03%) conflicts in affected programs: 853258 -> 531807 (-37.67%) helped: 31052 HURT: 19 And on BDW: total conflicts in shared programs: 1418393 -> 987539 (-30.38%) conflicts in affected programs: 1179787 -> 748933 (-36.52%) helped: 47592 HURT: 70 On SKL GT4e this improves performance of GpuTest Volplosion by 3.64% ±0.33% with n=16. NOTE: This patch intentionally disregards some i965 coding conventions for the sake of reviewability. This is addressed by the next squash patch which introduces an amount of (for the most part boring) boilerplate that might distract reviewers from the non-trivial algorithmic details of the pass. The following patch is squashed in: SQUASH: intel/fs/bank_conflicts: Roll back to the nineties. Acked-by: Matt Turner <mattst88@gmail.com>	2017-12-07 15:56:06 -08:00
Eric Engestrom	4cba39331d	meson: add dep_thread to every lib that includes threads.h Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141 Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-12-07 17:29:42 +00:00
Fredrik Höglund	5e1cb16768	anv: fix a case statement in GetMemoryFdPropertiesKHR The handle type in the case statement is supposed to be VK_EXTERNAL_- MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT. Fixes: `ab18e8e59b` ("anv: Implement VK_EXT_external_memory_dma_buf") Signed-off-by: Fredrik Höglund <fredrik@kde.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 20:04:39 +01:00
Jose Maria Casanova Crespo	a1e257a5bf	i965/fs: Use untyped_surface_read for 16-bit load_ssbo SSBO loads were using byte_scattered read messages as they allow reading 16-bit size components. byte_scattered messages can only operate one component at a time so we needed to emit as many messages as components. But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the untyped_surface_read message to read pairs of 16-bit components using only one message. Once each pair is read it is unshuffled to return the proper 16-bit components. vec3 case is assimilated to vec4 but the 4th component is ignored. 16-bit scalars are read using one byte_scattered_read message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using unshuffle 16 reads (Chema Casanova) v3: Use W and D types insead of HF and F in shuffle to avoid rounding erros (Jason Ekstrand) Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand) v4: Use subscript insead of chaging type and stride (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	ce2e572c4c	i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg Currently, we use byte-scattered write messages for storing 16-bit into an SSBO. This is because untyped surface messages have a fixed 32-bit size. This patch optimizes these 16-bit writes by combining 2 values (e.g, two consecutive components aligned with 32-bits) into a 32-bit register, packing the two 16-bit words. 16-bit single component values will continue to use byte-scattered write messages. The same will happens when the first consecutive component is not aligned 32-bits. This optimization reduces the number of SEND messages used for storing 16-bit values potentially by 2 or 4, which cuts down execution time significantly because byte-scattered writes are an expensive operation as they only write a component for message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using shuffle 16 write and enable writes of 16bit vec4 with only one message of 32-bits. (Chema Casanova) v3: - Fix coding style (Eduardo Lima) - Reorganize code to avoid duplication. (Jason Ekstrand) - Include new comments to explain the length calculations to fix alignment issues of components. (Jason Ekstrand) - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand) v4: (Jason Ekstrand) - Reorganize 64-bit ssbo-writes to avoid using slots_per_component. - Comment about why suffle is needed when using byte_scattered_write. Signed-off-by: Eduardo Lima <elima@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	66ce6ce78f	anv: Enable SPV_KHR_16bit_storage and VK_KHR_16bit_storage for SSBO/UBO Enables SPV_KHR_16bit_storage on gen 8+. VK_KHR_16bit_storage is enabled for SSBO/UBO using the VK_KHR_get_physical_device_properties2 functionality to expose if the extension is supported or not. v2: update due rebase against master (Alejandro) v3: (Jason Ekstrand) - Move this patch up in VK_KHR_16bit_storage series enabling only storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess. - Only expose VK_KHR_16bit_storage on Gen8+ v4: (Jason Ekstrand) - Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage enablement for SSBO/UBO. Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jason Ekstrand	3282309f74	i965/fs: Enables 16-bit load_ubo with sampler load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit surface format defined. So when reading 16-bit components with the sampler we need to unshuffle two 16-bit components from each 32-bit component. Using the sampler avoids the use of the byte_scattered_read message that needs one message for each component and is supposed to be slower. v2: (Jason Ekstrand) - Simplify component selection and unshuffling for different bitsizes - Remove SKL optimization of reading only two 32-bit components when reading 16-bits types. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	3db31c0b06	i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components This helpers are used to load/store 16-bit types from/to 32-bit components. The functions shuffle_32bit_load_result_to_16bit_data and shuffle_16bit_data_for_32bit_write are implemented in a similar way than the analogous functions for handling 64-bit types. v1: Explain need of temporary in shuffle operations. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00

... 2 3 4 5 6 ...

2842 Commits