KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	ab7543b13d	anv/cmd_buffer: Generalize transition_color_buffer This moves it to being based on layout_to_aux_usage instead of being hard-coded based on bits of a priori knowledge of how transitions interact with layouts. This conceptually simplifies things because we're now using layout_to_aux_usage and layout_supports_fast_clear to make resolve decisions so changes to those functions will do what one expects. There is a potential bug with window system integration on gen9+ where we wouldn't do a resolve when transitioning to the PRESENT_SRC layout because we just assume that everything that handles CCS_E can handle it all the time. When handing a CCS_E image off to the window system, we may need to do a full resolve if the window system does not support the CCS_E modifier. The only reason why this hasn't been a problem yet is because we don't support modifiers in Vulkan WSI and so we always get X tiling which implies no CCS on gen9+. This patch doesn't actually fix that bug yet but it takes us the first step in that direction by making us actually pick the correct resolve op. In order to handle all of the cases, we need more detailed aux tracking. v2 (Jason Ekstrand): - Make a few more things const - Use the anv_fast_clear_support enum v3 (Jason Ekstrand): - Move an assert and add a better comment Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	151771b390	anv/cmd_buffer: Recurse in transition_color_buffer instead of falling through Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	bea7373c92	anv/image: Support color aspects in layout_to_aux_usage Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	b09464db42	anv/image: Add a helper for determining when fast clears are supported v2 (Jason Ekstrand): - Return an enum instead of a boolean v3 (Jason Ekstrand): - Return ANV_FAST_CLEAR_NONE instead of false (Topi) - Rename ANV_FAST_CLEAR_ANY to ANV_FAST_CLEAR_DEFAULT_VALUE - Add documentation for the enum values v4 (Jason Ekstrand): - Remove a dead comment Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	1f7eee6bc1	anv/image: Update a comment This got lost in all of the aspect vs. plane rebasing of YCBCR. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	5c38ab8f07	anv/blorp: Rework HiZ ops to look like MCS and CCS Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	1d473e26f2	anv/blorp: Support ISL_AUX_USAGE_HIZ in surf_for_anv_image If the function gets passed ANV_AUX_USAGE_DEFAULT, it still has the old behavior of setting ISL_AUX_USAGE_NONE for depth/stencil which is what we want for blits/copies. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	42f1668a54	anv/blorp: Rework image clear/resolve helpers This replaces image_fast_clear and ccs_resolve with two new helpers that simply perform an isl_aux_op whatever that may be on CCS or MCS. This is a bit cleaner as it separates performing the aux operation from which blorp helper we have to call to do it. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Jason Ekstrand	482c24783e	intel/isl: Codify AUX operations in an enum Right now, we have different entrypoints and enums in blorp for these different operations. This provides us a central enum which we can begin to transition to. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-02-08 16:35:31 -08:00
Scott D Phillips	1f4d2433e7	meson: Add build option for tools Add a build option to control building some of the misc tools we have. Also set the executables to install, presumably you want that if you're asking for the build. v2: set 'install:' to the with_tools value, not true (Jordan) handle 'all' in a the comma list (Dylan) Add freedreno's tools (Dylan) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-02-08 11:24:42 -08:00
Timothy Arceri	ffeebcfa7e	i965: remove unused brw_nir_lower_cs_shared() This has been unused since `8761a04d0d`. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-02-07 08:38:01 +11:00
Iago Toral Quiroga	a5053ba27e	anv/device: initialize the list of enabled extensions properly The loop goes through the list of enabled extensions marking them as enabled in the list, but this relies on every other extension being initialized to false by default. This bug would make us, for example, advertise certain device extension entry points as available even when the corresponding extensions had not been enabled. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `abc62282b5` "anv: Add a per-device table of enabled extensions" Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-02-06 07:51:00 +01:00
Iago Toral Quiroga	1d20001d97	i965/nir: do int64 lowering before optimization Otherwise loop unrolling will fail to see the actual cost of the unrolling operations when the loop body contains 64-bit integer instructions, and very specially when the divmod64 lowering applies, since its lowering is quite expensive. Without this change, some in-development CTS tests for int64 get stuck forever trying to register allocate a shader with over 50K SSA values. The large number of SSA values is the result of NIR first unrolling multiple seemingly simple loops that involve int64 instructions, only to then lower these instructions to produce a massive pile of code (due to the divmod64 lowering in the unrolled instructions). With this change, loop unrolling will see the loops with the int64 code already lowered and will realize that it is too expensive to unroll. v2: Run nir_algebraic first so we can hopefully get rid of some of the int64 instructions before we even attempt to lower them. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-02-06 07:49:27 +01:00
Matt Turner	e2b31e9acf	i965: Move mistakenly placed line Ken called this out in review, but it seems I forgot to make the change. I noticed that the control flow annotations in the fragment shader disassembly of tests/shaders/glsl-fs-loop-continue.shader_test were not correct, and moving this line to the correct place fixes it.	2018-02-05 09:50:56 -08:00
Jason Ekstrand	589e9db23f	aubinator: Multiply count by 4 to compute buffer sizes The count field is in terms of dwords and not bytes. In `7d4007d58a`, I fixed one instance of this but missed another.	2018-02-02 22:30:56 -08:00
Kenneth Graunke	85ec7abc3f	intel/decoder: Fix control / evaluation label mixup. Trivial. DS is TES, HS is TCS.	2018-02-01 09:44:15 -08:00
Jason Ekstrand	97938dac36	anv/cmd_buffer: Re-emit the pipeline at every subpass If we ever hit this edge-case, it can theoretically cause problem for CNL because we could end up changing render targets without re-emitting 3DSTATE_MULTISAMPLE which is part of the pipeline. Just get rid of the edge case. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-01-30 17:16:33 -08:00
Iago Toral Quiroga	99b57daf4a	anv/pipeline: lower constant initializers on output variables earlier If a shader only writes to an output via a constant initializer we need to lower it before we call nir_remove_dead_variables so that this pass sees the stores from the initializer and doesn't kill the output. Fixes test failures in new work-in-progress CTS tests: dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_vert dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_frag Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-30 08:10:29 +01:00
Rafael Antognolli	fa21ddf7b1	anv/cmd_buffer: Emit PIPE_CONTROL with ISP bit on older platforms. Emit it on all platforms since gen7. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-29 14:52:07 -08:00
Timothy Arceri	5b8de4bdff	nir: add vs_inputs_dual_locations compiler option Allows nir drivers to either use a single or dual locations for vs double inputs. i965 uses dual locations for both OpenGL and Vulkan drivers, for now gallium OpenGL drivers only use a single location. The following patch will also make use of this option when calling nir_shader_gather_info(). Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-01-30 09:08:47 +11:00
Timothy Arceri	f63e05ae9e	compiler: tidy up double_inputs_read uses First we move double_inputs_read into a vs struct in the union, double_inputs_read is only used for vs inputs so this will save space and also allows us to add a new double_inputs field. We add the new field because `c2acf97fcc` changed the behaviour of double_inputs_read, and while it's no longer used to track actual reads in i965 we do still want to track this for gallium drivers. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-01-30 09:08:47 +11:00
Rafael Antognolli	20578f81a6	anv/gen10: Emit CS stall and mark push constants dirty. I got reviews and fixed the patches locally, but ended up merging the ones that I sent originally to the list. This patch fixes those mistakes. Fixes: `78c125af39` Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 11:59:17 -08:00
Rafael Antognolli	bcfd78e448	i965/gen10: Re-enable push constants. The GPU hang caused by push constants is apparently fixed, so let's enable them again. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 10:07:44 -08:00
Rafael Antognolli	78c125af39	anv/gen10: Ignore push constant packets during context restore. Similar to the GL driver, ignore 3DSTATE_CONSTANT_* packets when doing a context restore. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: "18.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 10:07:40 -08:00
Iago Toral Quiroga	d3ce493b34	anv/pipeline: remove the pipeline layout field from anv_pipeline It no longer has any users. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:47 +01:00
Iago Toral Quiroga	75a4802060	anv/cmd_buffer: add the pipeline layout to the pipeline state We need to access the pipeline layout to compute correct dynamic offsets for dyamic UBO/SSBO descriptors when we emit draw commands. Instead of taking it from the pipeline object, store the layout in the command buffer pipeline state. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:47 +01:00
Iago Toral Quiroga	e1a49f974b	anv/pipeline: don't take the layout from the pipeline to compile shaders The Vulkan spec states that VkPipelineLayout objects must not be destroyed while any command buffer that uses them is in the recording state, but it permits them to be destroyed otherwise. This means that applications are allowed to free pipeline layouts after command recording is finished even if there are pipeline objects that still exist and were created with these layouts. There are two solutions to this, one is to use reference counting on pipeline layout objects. The other is to avoid holding references to pipeline layouts where they are not really needed. This patch takes a step towards the second option by making the pipeline shader compile code take pipeline layout from the VkGraphicsPipelineCreateInfo provided rather than the pipeline object. A follow-up patch will remove any remaining uses of the layout field so we can remove it from the pipeline object and avoid the need for reference counting. v2: Use ANV_FROM_HANDLE, remove unnecessary braces (Jason) Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:46 +01:00
Iago Toral Quiroga	14f6275c92	anv/descriptor_set: add reference counting for descriptor set layouts The spec states that descriptor set layouts can be destroyed almost at any time: "VkDescriptorSetLayout objects may be accessed by commands that operate on descriptor sets allocated using that layout, and those descriptor sets must not be updated with vkUpdateDescriptorSets after the descriptor set layout has been destroyed. Otherwise, descriptor set layouts can be destroyed any time they are not in use by an API command." v2: allocate off the device allocator with DEVICE scope (Jason) Fixes the following work-in-progress CTS tests: dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.compute Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 14:06:46 +01:00
Jason Ekstrand	c8949e2498	anv/pipeline: Don't look at blend state unless we have an attachment Without this, we may end up dereferencing blend before we check for binding->index != UINT32_MAX. However, Vulkan allows the blend state to be NULL so long as you don't have any color attachments. This fixes a segfault when running The Talos Principal. Fixes: `12f4e00b69` Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-01-26 01:44:45 -08:00
Maxin B. John	8116b9170b	anv_icd.py: improve reproducible builds Sort the output to ensure build reproducibility Signed-off-by: Maxin B. John <maxin.john@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Fixes: `0ab04ba979` ("anv: Use python to generate ICD json files") Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-26 01:37:45 -08:00
Jason Ekstrand	db682b8f0e	i965/fs: Reset the register file to VGRF in lower_integer_multiplication `18fde36ced` changed the way temporary registers were allocated in lower_integer_multiplication so that we allocate regs_written(inst) space and keep the stride of the original destination register. This was to ensure that any MUL which originally followed the CHV/BXT integer multiply regioning restrictions would continue to follow those restrictions even after lowering. This works fine except that I forgot to reset the register file to VGRF so, even though they were assigned a number from alloc.allocate(), they had the wrong register file. This caused some GLES 3.0 CTS tests to start failing on Sandy Bridge due to attempted reads from the MRF: ES3-CTS.functional.shaders.precision.int.highp_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.int.mediump_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.int.lowp_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.uint.highp_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.uint.mediump_mul_fragment.snbm64 ES3-CTS.functional.shaders.precision.uint.lowp_mul_fragment.snbm64 This commit remedies this problem by, instead of copying inst->dst and overwriting nr, just make a new register and set the region to match inst->dst. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103626 Fixes: `18fde36ced` Cc: "17.3" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-01-25 13:58:55 -08:00
Emil Velikov	50265cd9ee	automake: anv: ship anv_extensions_gen.py in the tarball Fixes: `dd088d4bec` ("anv/extensions: Generate a header file with extension tables") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-01-25 17:06:29 +00:00
Jason Ekstrand	7d4007d58a	aubinator: Multiply count by 4 to compute buffer sizes The count field is in terms of dwords and not bytes.	2018-01-24 19:05:36 -08:00
Grazvydas Ignotas	0cc7370733	anv: correct a duplicate check in an assert Looks like checking both sources was intended, instead of the first one twice. Found with Coccinelle, coccinellery/xand/xand.cocci semantic patch. Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-01-25 01:10:45 +02:00
Jason Ekstrand	4064fe59e7	anv/cmd_buffer: Move gen7 index buffer state to graphics state Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:46 -08:00
Jason Ekstrand	38ec78049f	anv/cmd_buffer: Move num_workgroups to compute state While we're here, make it an anv_address. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:44 -08:00
Jason Ekstrand	95ff232294	anv/cmd_buffer: Move dynamic state to graphics state Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:43 -08:00
Jason Ekstrand	24caee8975	anv/cmd_buffer: Use a temporary variable for dynamic state We were already doing this for some packets to keep the lines shorter. We may as well just do it for all of them. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:40 -08:00
Jason Ekstrand	8bd5ec5b86	anv/cmd_buffer: Move vb_dirty bits into anv_cmd_graphics_state Vertex buffers are entirely a graphics pipeline thing. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:39 -08:00
Jason Ekstrand	e85aaec148	anv/cmd_buffer: Move dirty bits into anv_cmd_*_state Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:36 -08:00
Jason Ekstrand	97f96610c8	anv: Separate compute and graphics descriptor sets The Vulkan spec says: "pipelineBindPoint is a VkPipelineBindPoint indicating whether the descriptors will be used by graphics pipelines or compute pipelines. There is a separate set of bind points for each of graphics and compute, so binding one does not disturb the other." Up until now, we've been ignoring the pipeline bind point and had just one bind point for everything. This commit separates things out into separate bind points. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102897 Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:33 -08:00
Jason Ekstrand	31b2144c83	anv/cmd_buffer: Use anv_descriptor_for_binding for samplers Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:31 -08:00
Jason Ekstrand	b9e1ca16f8	anv/cmd_buffer: Add a helper for binding descriptor sets This lets us unify some code between push descriptors and regular descriptors. It doesn't do much for us yet but it will. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:30 -08:00
Jason Ekstrand	90cceaa9dd	anv/cmd_buffer: Refactor ensure_push_descriptor_set It's now a function which returns the push descriptor set. Since we set the error on the command buffer, returning the error is a little redundant. Returning the descriptor set (or NULL on error) is more convenient. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:28 -08:00
Jason Ekstrand	d5592e2fda	anv: Remove semicolons from vk_error[f] definitions With the semicolons, they can't be used in a function argument without throwing syntax errors. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:27 -08:00
Jason Ekstrand	9af5379228	anv/cmd_buffer: Add substructs to anv_cmd_state for graphics and compute Initially, these just contain the pipeline in a base struct. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:25 -08:00
Jason Ekstrand	ddc2d28548	anv/cmd_buffer: Use some pre-existing pipeline temporaries There are several places where we'd already saved the pipeline off to a temporary variable but, due to an artifact of history, weren't actually using that temporary everywhere. No functional change. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:24 -08:00
Jason Ekstrand	cd3feea745	anv/cmd_buffer: Rework anv_cmd_state_reset This splits anv_cmd_state_reset into separate init and finish functions. This lets us share init code with cmd_buffer_create. This potentially fixes subtle bugs where we may have missed some bit of state that needs to get initialized on command buffer creation. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:22 -08:00
Jason Ekstrand	d6c9a89d13	anv/cmd_buffer: Get rid of the meta query workaround Meta has been gone for a long time. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:20 -08:00
Jason Ekstrand	bc0a21e348	anv/cmd_state: Drop the scratch_size field This is a legacy left-over from the mechanism we used to use to handle scratch. The new (and better) mechanism doesn't use this. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:19 -08:00
Jason Ekstrand	4b69ba3817	anv/pipeline: Don't assert on more than 32 samplers This prevents an assert when running one unreleased Vulkan game. Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-01-23 21:10:08 -08:00
Jason Ekstrand	de00e8227b	anv: Return trampoline entrypoints from GetInstanceProcAddr Technically, the Vulkan spec requires that we return valid entrypoints for all core functionality and any available device extensions. This means that, for gen-specific functions, we need to return a trampoline which looks at the device and calls the right device function. In 99% of cases, the loader will do this for us but, aparently, we're supposed to do it too. It's a tiny increase in binary size for us to carry this around but really not bad. Before: text data bss dec hex filename 3541775 204112 6136 3752023 394057 libvulkan_intel.so After: text data bss dec hex filename 3551463 205632 6136 3763231 396c1f libvulkan_intel.so Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	eac29f3a6d	anv/entrypoints: Use an named tuple for params This allows us to store a bit more detailed data per-param Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	1f79d986af	anv: Only advertise enabled entrypoints The Vulkan spec annoyingly requires us to track what core version and what all extensions are enabled and only advertise those entrypoints. Any call to vkGet*ProcAddr for an entrypoint for an extension the client has not explicitly enabled is supposed to return NULL. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	e3d27542ae	anv: Add a per-device dispatch table We also switch GetDeviceProcAddr over to use it. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	0c399dca51	anv: Add a per-instance dispatch table We also switch GetInstanceProcAddr over to use it. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	a372b9247d	anv: Properly NULL for GetInstanceProcAddr with a null instance Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	cb0d1ba156	anv/extensions: Fix VkVersion::c_vk_version for patch == None Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	93e789a266	anv/entrypoints: Parse entrypoints before extensions/features Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	2f493121ae	anv/entrypoints: Expose the different dispatch tables Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	083e126694	anv/entrypoints: Split entrypoint index lookup into its own function Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	7039308d7c	anv/entrypoints: Add a LAYERS helper variable Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	f54227856f	anv/entrypoints: Add an Entrypoint class Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	abc62282b5	anv: Add a per-device table of enabled extensions Nothing uses this at the moment, but we will need it soon. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	01b9701a5c	anv: Use tables for device extension wrangling Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	920bd2c0bc	anv: Add a per-instance table of enabled extensions Nothing needs this yet but we will want it later. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	ff5f3e2b21	anv: Use tables for instance extension wrangling This lets us move a bunch of stuff out of codegen and back into anv_device.c which is a bit nicer. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	dd088d4bec	anv/extensions: Generate a header file with extension tables This allows us better introspection into extensions. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	ffb10bfd8e	anv/meson: Simplify some dependency and flag tracking This removes some redundant code between libanv_common, libvulkan_intel, and libvulkan_intel_test. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	f939940809	anv: Split anv_extensions.py into two files The new anv_extensions_gen.py is the code generator while the old anv_extensions.py file is purely declarative. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Jason Ekstrand	10d1b0be8e	anv/meson: Make anv_entrypoints_gen.py depend on anv_extensions.py Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-01-23 00:15:40 -08:00
Kenneth Graunke	60f15477da	i965: Drop render_target_start from binding table struct. We have to start render targets at binding table index 0 in order to use headerless FB write messages, and in fact already assume this in a bunch of places in the code. Let's finish that off, and not bother storing 0 in a struct to pretend to add it in a few places. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-01-22 10:03:52 -08:00
Dylan Baker	436ed65d38	autotools: include meson build files in tarball This adds the meson.build, meson_options.txt, and a few scripts that are used exclusively by the meson build. v2: - Remove accidentally included changes needed to test make dist with LLVM > 3.9 Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Acked-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-01-19 16:30:51 -08:00
Samuel Iglesias Gonsálvez	7109a1fe13	anv: avoid segmentation fault due to vk_error() vk_error() is a macro that calls __vk_errorf() with instance == NULL. Then, __vk_errorf() passes a pointer to instance->debug_report_callbacks to vk_debug_error(), which segfaults as this pointer is invalid but not NULL. Fixes: `e5b1bd6ab8` "vulkan: move anv VK_EXT_debug_report implementation to common code." Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-01-19 09:39:05 +01:00
Chris Wilson	34499e8ddc	intel: Future-proof ring names for aubinator_error_decode The kernel is moving to a $class$instance naming scheme in preparation for accommodating more rings in the future in a consistent manner. It is already using the naming scheme internally, and now we are looking at updating some soft-ABI such as the error state to use the new naming scheme. This of course means we need to teach aubinator_error_decode how to map both sets of ring names onto its register maps. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2018-01-18 17:35:21 +00:00
Iago Toral Quiroga	7ec6e4e689	anv/query: implement multiview interactions From the Vulkan spec with KHX extensions: "If queries are used while executing a render pass instance that has multiview enabled, the query uses N consecutive query indices in the query pool (starting at query) where N is the number of bits set in the view mask in the subpass the query is used in. How the numerical results of the query are distributed among the queries is implementation-dependent. For example, some implementations may write each view's results to a distinct query, while other implementations may write the total result to the first query and write zero to the other queries. However, the sum of the results in all the queries must accurately reflect the total result of the query summed over all views. Applications can sum the results from all the queries to compute the total result." In our case we only really emit a single query (in the first query index) that stores the aggregated result for all views, but we still need to manage availability for all the other query indices involved, even if we don't actually use them. This is relevant when clients call vkGetQueryPoolResults and pass all N queries to retrieve the results. In that scenario, without this patch, we will never see queries other than the first being available since we never emit them. v2: we need the same treatment for timestamp queries. v3 (Jason): - Better an if instead of an early return. - We can't write to this memory in the CPU, we should use MI_STORE_DATA_IMM and emit_query_availability (Jason). v4 (Jason): - No need to take the value to write as parameter, just hard code it to 0. Fixes test failures in some work-in-progress CTS multiview+query tests. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-18 16:37:06 +01:00
Samuel Iglesias Gonsálvez	eac629deb6	anv: return VK_ERROR_OUT_OF_DEVICE_MEMORY when surface size is out of HW limits Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-18 06:48:47 +01:00
Francisco Jerez	11674dad8a	intel/fs: Optimize and simplify the copy propagation dataflow logic. Previously the dataflow propagation algorithm would calculate the ACP live-in and -out sets in a two-pass fixed-point algorithm. The first pass would update the live-out sets of all basic blocks of the program based on their live-in sets, while the second pass would update the live-in sets based on the live-out sets. This is incredibly inefficient in the typical case where the CFG of the program is approximately acyclic, because it can take up to 2*n passes for an ACP entry introduced at the top of the program to reach the bottom (where n is the number of basic blocks in the program), until which point the algorithm won't be able to reach a fixed point. The same effect can be achieved in a single pass by computing the live-in and -out sets in lock-step, because that makes sure that processing of any basic block will pick up the updated live-out sets of the lexically preceding blocks. This gives the dataflow propagation algorithm effectively O(n) run-time instead of O(n^2) in the acyclic case. The time spent in dataflow propagation is reduced by 30x in the GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP test-case on my CHV system (the improvement is likely to be of the same order of magnitude on other platforms). This more than reverses an apparent run-time regression in this test-case from my previous copy-propagation undefined-value handling patch, which was ultimately caused by the additional work introduced in that commit to account for undefined values being multiplied by a huge quadratic factor. According to Chad this test was failing on CHV due to a 30s time-out imposed by the Android CTS (this was the case regardless of my undefined-value handling patch, even though my patch substantially exacerbated the issue). On my CHV system this patch reduces the overall run-time of the test by approximately 12x, getting us to around 13s, well below the time-out. v2: Initialize live-out set to the universal set to avoid rather pessimistic dataflow estimation in shaders with cycles (Addresses performance regression reported by Eero in GpuTest Piano). Performance numbers given above still apply. No shader-db changes with respect to master. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271 Reported-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-01-17 11:56:08 -08:00
Bas Nieuwenhuizen	e5b1bd6ab8	vulkan: move anv VK_EXT_debug_report implementation to common code. For also using it in radv. I moved the remaining stubs back to anv_device.c as they were just trivial. This does not move the vk_errorf/anv_perf_warn or the object type macros, as those depend on anv types and logging. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-01-17 11:27:52 +01:00
Samuel Iglesias Gonsálvez	e63adf8b1e	anv: VkDescriptorSetLayoutBinding can have descriptorCount == 0 From Vulkan spec: "descriptorCount is the number of descriptors contained in the binding, accessed in a shader as an array. If descriptorCount is zero this binding entry is reserved and the resource must not be accessed from any stage via this binding within any pipeline using the set layout." Fixes: dEQP-VK.binding_model.descriptor_update.empty_descriptor.uniform_buffer Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2018-01-12 07:08:51 +01:00
Dylan Baker	2083a14179	meson: Use dependencies for nir This creates two new internal dependencies, idep_nir_headers and idep_nir. The former encapsulates the generation of nir_opcodes.h and nir_builder_opcodes.h and adding src/compiler/nir as an include path. This ensures that any target that needs nir headers will have the includes and that the generated headers will be generated before the target is build. The second, idep_nir, includes the first and additionally links to libnir. This is intended to make it easier to avoid race conditions in the build when using nir, since the number of consumers for libnir and it's headers are quite high. Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>	2018-01-11 15:40:02 -08:00
Dylan Baker	60856a7b49	meson: don't use intermediate variables that are immediately discarded For things like: loop x = func() list += x end just do: loop list += func() end Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>	2018-01-11 15:40:02 -08:00
Dylan Baker	4ccb981673	meson: Use consistent style for tests Don't use intermediate variables, use consistent whitespace. Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>	2018-01-11 15:40:02 -08:00
Dylan Baker	fbf192a67e	meson: Use consistent style Currently the meosn build has a mix of two styles: arg : [foo, ... bar], and arg : [ foo, ..., bar, ] For consistency let's pick one. I've picked the later style, which I think is more readable, and is more common in the mesa code base. v2: - fix commit message Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>	2018-01-11 15:40:02 -08:00
Jason Ekstrand	c3d802d68e	i965: Use UD types for gl_SampleID setup We already had to switch all of the W types to UW to prevent issues with vector immediates on gen10. We may as well use unsigned types everywhere. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-01-11 14:31:47 -08:00
Jason Ekstrand	3d2b157e23	i965/fs: Use UW types when using V immediates Gen 10 has a strange hardware bug involving V immediates with W types. It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2 getting the value {3, 2, 1, 0, 3, 2, 1, 0}. In particular, the bottom four nibbles are repeated instead of the top four being taken. (A mov of 0x00003210V yields the same result.) This bug does not appear in any hardware documentation as far as we can tell and the simulator does not implement the bug either. Commit `6132992cdb` was mostly a no-op except that it changed the type of the subgroup invocation from UW to W and caused us to tickle this bug with basically every compute shader that uses any sort of invocation ID (which is most of them). This is also potentially an issue for geometry shader input pulls and SampleID setup. The easy solution is just to change the few places where we use a vector integer immediate with a W type to use a UW type. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org Fixes: `6132992cdb`	2018-01-11 14:31:38 -08:00
Matt Turner	c0ef14f5b1	Revert "Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+"" This reverts commit `2d04572038`. Acked-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-01-11 10:11:59 -08:00
Matt Turner	01ebfbb67a	i965/fs: Add/use functions to convert to 3src_align1 vstride/hstride Some cases weren't handled, such as stride 4 which is needed for 64-bit operations. Presumably fixes the assertion failure mentioned in commit `2d04572038` (Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+") but who can really say since the commit neglected to list any of them! Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-01-11 10:11:59 -08:00
Alex Smith	4fd85617c1	anv: Make sure state on primary is correct after CmdExecuteCommands After executing a secondary command buffer, we need to update certain state on the primary command buffer to reflect changes by the secondary. Otherwise subsequent commands may not have the correct state set. This fixes various issues (rendering errors, GPU hangs) seen after executing secondary command buffers in some cases. v2 (Jason Ekstrand): - Reset to invalid values instead of pulling from the secondary - Change the comment to be more descriptive Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2018-01-11 18:11:08 +00:00
Andres Gomez	a1901d092c	anv: Import mako templates only during execution of anv_extensions anv_extensions usage from anv_icd was bringing the unwanted dependency of mako templates for the latter. We don't want that since it will force the dependency even for distributable tarballs which was not needed until now. Jason suggested this approach. v2: Patch simplification (Jason). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551 Fixes: `0ab04ba979` ("anv: Use python to generate ICD json files") Cc: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-11 14:44:03 +02:00
Samuel Iglesias Gonsálvez	c0816389c2	anv: fix maxDescriptorSet* limits "The maxDescriptorSet* limit is n times the corresponding maxPerStageDescriptor* limit, where n is the number of shader stages supported by the VkPhysicalDevice. If all shader stages are supported, n = 6 (vertex, tessellation control, tessellation evaluation, geometry, fragment, compute)." Fixes: dEQP-VK.api.info.device.properties Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-11 07:00:42 +01:00
Iago Toral Quiroga	4317c848b9	i965/nir: add a helper to lower gl_PatchVerticesIn to a uniform v2: do not try to handle it as a system value directly for the SPIR-V path. In GL we rather handle it as a uniform like we do for the GLSL path (Jason). v3: - Remove the uniform variable, it is alwats -1 now (Jason) - Also do the lowering for the TessEval stage (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-10 08:21:02 +01:00
Kenneth Graunke	28c2d0d80b	genxml: Add missing INSTDONE_1 bits on Gen7.5+. This will make aubinator_error_decode decode them properly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-01-09 10:13:53 -08:00
Kenneth Graunke	8eadc2fb8f	intel: Apply Geminilake "Barrier Mode" workaround. Apparently, Geminilake requires you to whack a chicken bit to select either compute or tessellation mode for barriers. The recommendation is to switch between them at PIPELINE_SELECT time. We may not need to do this all the time, but I don't know that it hurts either. PIPELINE_SELECT is already a pretty giant stall. This appears to fix hangs in tessellation control shaders with barriers on Geminilake. Note that this requires a corresponding kernel change, drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. in order for the register write to actually happen. Without an updated kernel, this register write will be noop'd and the fix will not work. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-01-09 10:13:33 -08:00
Scott D Phillips	42f421cbbf	aubinator: add support for aubinating memtrace aubs Memtrace aubs are similar to classic aubs, with the major difference being how command submission is serialized (as register writes instead of a high-level submit message). Some internal tools generate or consume only memtrace aubs. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Scott D Phillips	8cdf5bd292	aubinator: extract aubinator_init() out of the header handler function A later patch will use the aubinator_init() function from the memtrace aub header handler. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Scott D Phillips	4f0a2ff4c1	aubinator: honor --color option when printing the header Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-01-08 21:11:11 -08:00
Alex Smith	0d8b9c529c	anv: Allow PMA optimization to be enabled in secondary command buffers This was never enabled in secondary buffers because hiz_enabled was never set to true for those. If the app provides a framebuffer in the inheritance info when beginning a secondary buffer, we can determine if HiZ is enabled and therefore allow the PMA optimization to be enabled within the command buffer. This improves performance by ~13% on an internal benchmark on Skylake. v2: Use anv_cmd_buffer_get_depth_stencil_view(). Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-08 09:31:17 +00:00
Alex Smith	12f4e00b69	anv: Take write mask into account in has_color_buffer_write_enabled If we have a color attachment, but its writes are masked, this would have still returned true. This is inconsistent with how HasWriteableRT in 3DSTATE_PS_BLEND is set, which does take the mask into account. This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA if the fragment shader does use UAVs, meaning the fragment shader may not be invoked because HasWriteableRT is false. Specifically, this was seen to occur when the shader also enables early fragment tests: the fragment shader was not invoked despite passing depth/stencil. Fix by taking the color write mask into account in this function. This is consistent with how things are done on i965. Signed-off-by: Alex Smith <asmith@feralinteractive.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-01-05 15:36:22 +00:00
Alex Smith	00a81e9909	anv: Add missing unlock in anv_scratch_pool_alloc Fixes hangs seen due to the lock not being released here. Signed-off-by: Alex Smith <asmith@feralinteractive.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-01-04 14:54:02 +00:00
Kenneth Graunke	74e1d6e20c	i965: Drop support for the legacy SNORM -> Float equation. Older OpenGL defines two equations for converting from signed-normalized to floating point data. These are: f = (2c + 1)/(2^b - 1) (equation 2.2) f = max{c/2^(b-1) - 1), -1.0} (equation 2.3) Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be used in all scenarios, and remove equation 2.2. DirectX uses equation 2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+ systems that use the vertex fetcher hardware to do the conversions always get formula 2.3. This can make a big difference for 10-10-10-2 formats - the 2-bit value can represent 0 with equation 2.3, and cannot with equation 2.2. Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES. Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use the new rules, at least in core profile. That would leave Gen4-6 doing something different than all other hardware, which seems...lame. With context version promotion, applications that requested a pre-4.2 context may get promoted to 4.2, and thus get the new rules. Zero cases have been reported of this being a problem. However, we've received a report that following the old rules breaks expectations. SuperTuxKart apparently renders the cars red when following equation 2.2, and works correctly when following equation 2.3: https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405 So, this patch deletes the legacy equation 2.2 support entirely, making all hardware and APIs consistently use the new equation 2.3 rules. If we ever find an application that truly requires the old formula, then we'd likely want that application to work on modern hardware, too. We'd likely restore this support as a driconf option. Until then, drop it. This commit will regress Piglit's draw-vertices-2101010 test on pre-Haswell without the corresponding Piglit patch to accept either formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1): draw-vertices-2101010: Accept either SNORM conversion formula. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisforbes@google.com>	2018-01-02 16:51:42 -08:00
Kenneth Graunke	a1afef8de0	i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes. These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-30 20:30:34 -08:00
Jason Ekstrand	967d238c69	anv/device: Mark all state buffers as needing capture Previously, we were flagging the instruction state buffer for capture but not surface state or dynamic state. We want those captured too. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Jason Ekstrand	69fa3fb77f	intel/aubinator: Gracefully handle dynamic state not being available Some older versions of the Vulkan driver didn't properly tag dynamic state as needing to be captured. Also, this prevents crashes when looking at dumps on older kernels. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Jason Ekstrand	a92d52c3c1	intel/aubinator: Free section data last We were walking the sections, printing the batches, and then freeing them in one pass. If the batch happens to reference any earlier sections (which it almost certainly will since it's at the end), we will access freed memory. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-28 10:39:04 -08:00
Anuj Phogat	2d04572038	Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+" This reverts commit `9cd60fce9c`. Above commit caused 2000+ piglit tests to assert fail. Disabling the align1 mode on gen10 for now to avoid failures. Cc: Matt Turner <mattst88@gmail.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-12-22 16:40:40 -08:00
Francisco Jerez	b3e3cb9901	intel/fs: Initialize fs_visitor::grf_used on construction. This should shut up some Valgrind errors during pre-regalloc scheduling. The errors were harmless since they could only have led to the estimation of the bank conflict penalty of an instruction pre-regalloc, which is inaccurate at that point of the program compilation, but no less accurate than the intended "return 0" fall-back path. The scheduling pass is normally re-run after regalloc with a well-defined grf_used value and accurate bank conflict information. Fixes: `acf98ff933` "intel/fs: Teach instruction scheduler about GRF bank conflict cycles." Reported-by: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-12-21 15:20:17 -08:00
Francisco Jerez	1aa79d5ed5	intel/fs/bank_conflicts: Use posix_memalign() instead of overaligned new to obtain vector storage. The weight_vector_type constructor was inadvertently assuming C++17 semantics of the new operator applied on a type with alignment requirement greater than the largest fundamental alignment. Unfortunately on earlier C++ dialects the implementation was allowed to raise an allocation failure when the alignment requirement of the allocated type was unsupported, in an implementation-defined fashion. It's expected that a C++ implementation recent enough to implement P0035R4 would have honored allocation requests for such over-aligned types even if the C++17 dialect wasn't active, which is likely the reason why this problem wasn't caught by our CI system. A more elegant fix would involve wrapping the __SSE2__ block in a '__cpp_aligned_new >= 201606' preprocessor conditional and continue taking advantage of the language feature, but that would yield lower compile-time performance on old compilers not implementing it (e.g. GCC versions older than 7.0). Fixes: `af2c320190` "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226 Reported-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-12-21 15:19:59 -08:00
Samuel Iglesias Gonsálvez	a31f0c4a36	anv: disallow VK_REMAINING_ARRAY_LAYERS in vkCmdClearAttachments() Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed in the passed VkClearRect struct. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-20 06:55:41 +01:00
Rafael Antognolli	85789831b4	intel/compiler/gen10: Disable push constants. We still have gpu hangs on Cannonlake when using push constants, so disable them for now until we have a proper fix for these hangs. v2: Add warning message when creating context too. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2017-12-19 12:32:24 -08:00
Bas Nieuwenhuizen	6d9849d63e	anv: Remove unused variable. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-17 14:53:46 +01:00
Kenneth Graunke	02720f8d24	isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell. According to the RENDER_SURFACE_STATE internal documentation, the R32G32B32_FLOAT restriction is marked "IVB" only. We choose to apply it to Ivybridge and Baytrail, but not Haswell. Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell. Changes these tests from crashing to skipping on Haswell: - KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f - KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-15 14:00:09 -08:00
Jason Ekstrand	4b8c9ea46b	intel/tools: Convert aubinator over to the common framework Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:24 -08:00
Jason Ekstrand	35f9c27be3	intel/batch-decoder: Decode registers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:22 -08:00
Jason Ekstrand	81e4ecbc19	intel/batch-decoder: Decode dynamic state Unfortunately, in aubinator and aubinator_error_decode we don't always know how many of a given state we have, so we must guess. One day, we'll come up with a way to annotate the batch to solve this problem. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:20 -08:00
Jason Ekstrand	4ac2ee9001	intel/batch-decoder: Decode constants, binding tables, and samplers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:18 -08:00
Jason Ekstrand	d374423eab	intel/tools: Switch aubinator_error_decode over to the gen_print_batch The shared framework can now do everything that aubinator_error_decode ever did and more. It's time to make the switch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:16 -08:00
Jason Ekstrand	c86671c438	intel/batch-decoder: Decode graphics shaders Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:15 -08:00
Jason Ekstrand	d4081fb778	intel/batch-decoder: Decode vertex and index buffers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:13 -08:00
Jason Ekstrand	e27ec208ed	intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:12 -08:00
Jason Ekstrand	be20043d00	intel/tools: Add the start of a generic batch decoder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:10 -08:00
Jason Ekstrand	4cb96fbd91	intel/decoder: Expose the raw field value in the iterator Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:09 -08:00
Jason Ekstrand	79269e8f4b	intel/disasm: Take a devinfo in gen_disasm_create Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:06 -08:00
Jason Ekstrand	a7ae72032f	intel/decoder: Take a bit offset in gen_print_group Previously, if a group was nested in another group such that it didn't start on a dword boundary, we would decode it as if it started at the start of its first dword. This changes things to work even more in terms of bits so that we can properly decode these structs. This affects MOCS, attribute swizzles, and several other things. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:04 -08:00
Jason Ekstrand	dca8f466ee	intel/decoder: Stop rounding down to the nearest dword Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:03 -08:00
Jason Ekstrand	f264640693	intel/decoder: Convert the iterator to work entirely in bits Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:27:01 -08:00
Jason Ekstrand	ada705b671	intel/decoder: Drop gen_field_decode helper It's unused Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-12-14 13:26:44 -08:00
Francisco Jerez	acab52f520	intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers. Fixes: `af2c320190` "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199 Reported-by: Darius Spitznagel <d.spitznagel@goodbytez.de> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-12-12 12:05:45 -08:00
Samuel Iglesias Gonsálvez	ba4bb0838b	anv: fix bug when using component qualifier in FS outputs We can write to the same output but in different components, like in this example: layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0; layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1; Therefore, they are not two different outputs but only one. Fixes: dEQP-VK.glsl.440.linkage.varying.component.frag_out.* v3: - Remove FRAG_RESULT_MAX. - Add const and use sizeof (Ian). - Do three-pass to set properly the locations of fragment outputs when having arrays (Jason). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-12 07:24:55 +01:00
Jason Ekstrand	4c7af87fb9	anv: Enable UBO pushing Push constants on Intel hardware are significantly more performant than pull constants. Since most Vulkan applications don't actively use push constants on Vulkan or at least don't use it heavily, we're pulling way more than we should be. By enabling pushing chunks of UBOs we can get rid of a lot of those pulls. On my SKL GT4e, this improves the performance of Dota 2 and Talos by around 2.5% and improves Aztec Ruins by around 2%. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:26 -08:00
Jason Ekstrand	f1ce0b905a	i965/fs: Handle !supports_pull_constants and push UBOs properly In Vulkan, we don't support classic pull constants and everything the client asks us to push, we push. However, for pushed UBOs, we still want to fall back to conventional pulls if we run out of space.	2017-12-08 15:43:25 -08:00
Jason Ekstrand	8d34077182	anv/device: Increase the UBO alignment requirement to 32 Push constants work in terms of 32-byte chunks so if we want to be able to push UBOs, every thing needs to be 32-byte aligned. Currently, we only require 16-byte which is too small. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	2f9eb045f3	anv/cmd_buffer: Add support for pushing UBO ranges In order to do this we have to modify push constant set up to handle ranges. We also have to tweak the way we handle dirty bits a bit so that we re-push whenever a descriptor set changes. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	0c879b62b0	anv/cmd_buffer: Add some stage asserts There are several places where we look up opcodes in an array of stages. Assert that the we don't end up going out-of-bounds. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	1968cd07a2	anv/cmd_buffer: Add some helpers for working with descriptor sets Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	1bce04deb8	anv/pipeline: Translate vulkan_resource_index to a constant when possible We want to call brw_nir_analyze_ubo_ranges immedately after anv_nir_apply_pipeline_layout and it badly wants constants. We could run an optimization step and let constant folding do it but that's way more expensive than needed. It's really easy to just handle constants in apply_pipeline_layout. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	3b34ed79f1	i965/fs: Rewrite assign_constant_locations This rewires the logic for assigning uniform locations to work in terms of "complex alignments". The basic idea is that, as we walk the list of instructions, we keep track of the alignment and continuity requirements of each slot and assert that the alignments all match up. We then use those alignments in the compaction stage to ensure that everything gets placed at a properly aligned register. The old mechanism handled alignments by special-casing each of the bit sizes and placing 64-bit values first followed by 32-bit values. The old scheme had the advantage of never leaving a hole since all the 64-bit values could be tightly packed and so could the 32-bit values. However, the new scheme has no type size special cases so it handles not only 32 and 64-bit types but should gracefully extend to 16 and 8-bit types as the need arises. Tested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-12-08 15:43:25 -08:00
Jason Ekstrand	597c194487	anv: Disable VK_KHR_16bit_storage The testing for this extension is currently very poor. The CTS tests only test accessing UBOs and SSBOs at dynamic offsets so none of our constant-offset paths get triggered at all. Also, there's an assertion in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which is never triggered indicating that nothing every gets loaded from an offset which is not a dword. Both push constants and the constant offset pull paths are complex enough, we really don't want to ship without tests. We'll turn the extension back on once we have decent tests.	2017-12-08 15:42:55 -08:00
Francisco Jerez	4d1959e693	intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution. This addresses a long-standing back-end compiler bug that could lead to cross-channel data corruption in loops executed non-uniformly. In some cases live variables extending through a loop divergence point (e.g. a non-uniform break) into a convergence point (e.g. the end of the loop) wouldn't be considered live along all physical control flow paths the SIMD thread could possibly have taken in between due to some channels remaining in the loop for additional iterations. This patch fixes the problem by extending the CFG with physical edges that don't exist in the idealized non-vectorized program, but represent valid control flow paths the SIMD EU may take due to the divergence of logical threads. This makes sense because the i965 IR is explicitly SIMD, and it's not uncommon for instructions to have an influence on neighboring channels (e.g. a force_writemask_all header setup), so the behavior of the SIMD thread as a whole needs to be considered. No changes in shader-db. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-12-07 18:27:05 -08:00
Francisco Jerez	9355116bda	intel/fs: Don't let undefined values prevent copy propagation. This makes the dataflow propagation logic of the copy propagation pass more intelligent in cases where the destination of a copy is known to be undefined for some incoming CFG edges, building upon the definedness information provided by the last patch. Helps a few programs, and avoids a handful shader-db regressions from the next patch. shader-db results on ILK: total instructions in shared programs: 6541547 -> 6541523 (-0.00%) instructions in affected programs: 360 -> 336 (-6.67%) helped: 8 HURT: 0 LOST: 0 GAINED: 10 shader-db results on BDW: total instructions in shared programs: 8174323 -> 8173882 (-0.01%) instructions in affected programs: 7730 -> 7289 (-5.71%) helped: 5 HURT: 2 LOST: 0 GAINED: 4 shader-db results on SKL: total instructions in shared programs: 8185669 -> 8184598 (-0.01%) instructions in affected programs: 10364 -> 9293 (-10.33%) helped: 5 HURT: 2 LOST: 0 GAINED: 2 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-07 18:27:04 -08:00
Francisco Jerez	c3c1aa5aeb	intel/fs: Restrict live intervals to the subset possibly reachable from any definition. Currently the liveness analysis pass would extend a live interval up to the top of the program when no unconditional and complete definition of the variable is found that dominates all of its uses. This can lead to a serious performance problem in shaders containing many partial writes, like scalar arithmetic, FP64 and soon FP16 operations. The number of oversize live intervals in such workloads can cause the compilation time of the shader to explode because of the worse than quadratic behavior of the register allocator and scheduler when running out of registers, and it can also cause the running time of the shader to explode due to the amount of spilling it leads to, which is orders of magnitude slower than GRF memory. This patch fixes it by computing the intersection of our current live intervals with the subset of the program that can possibly be reached from any definition of the variable. Extending the storage allocation of the variable beyond that is pretty useless because its value is guaranteed to be undefined at a point that cannot be reached from any definition. According to Jason, this improves performance of the subgroup Vulkan CTS tests significantly (e.g. the runtime of the dvec4 broadcast test improves by nearly 50x). No significant change in the running time of shader-db (with 5% statistical significance). shader-db results on IVB: total cycles in shared programs: 61108780 -> 60932856 (-0.29%) cycles in affected programs: 16335482 -> 16159558 (-1.08%) helped: 5121 HURT: 4347 total spills in shared programs: 1309 -> 1288 (-1.60%) spills in affected programs: 249 -> 228 (-8.43%) helped: 3 HURT: 0 total fills in shared programs: 1652 -> 1597 (-3.33%) fills in affected programs: 262 -> 207 (-20.99%) helped: 4 HURT: 0 LOST: 2 GAINED: 209 shader-db results on BDW: total cycles in shared programs: 67617262 -> 67361220 (-0.38%) cycles in affected programs: 23397142 -> 23141100 (-1.09%) helped: 8045 HURT: 6488 total spills in shared programs: 1456 -> 1252 (-14.01%) spills in affected programs: 465 -> 261 (-43.87%) helped: 3 HURT: 0 total fills in shared programs: 1720 -> 1465 (-14.83%) fills in affected programs: 471 -> 216 (-54.14%) helped: 4 HURT: 0 LOST: 2 GAINED: 162 shader-db results on SKL: total cycles in shared programs: 65436248 -> 65245186 (-0.29%) cycles in affected programs: 22560936 -> 22369874 (-0.85%) helped: 8457 HURT: 6247 total spills in shared programs: 437 -> 437 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 870 -> 854 (-1.84%) fills in affected programs: 16 -> 0 helped: 1 HURT: 0 LOST: 0 GAINED: 107 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-07 18:27:04 -08:00
Francisco Jerez	acf98ff933	intel/fs: Teach instruction scheduler about GRF bank conflict cycles. This should allow the post-RA scheduler to do a slightly better job at hiding latency in presence of instructions incurring bank conflicts. The main purpuse of this patch is not to improve performance though, but to get conflict cycles to show up in shader-db statistics in order to make sure that regressions in the bank conflict mitigation pass don't go unnoticed. Acked-by: Matt Turner <mattst88@gmail.com>	2017-12-07 15:56:49 -08:00
Francisco Jerez	af2c320190	intel/fs: Implement GRF bank conflict mitigation pass. Unnecessary GRF bank conflicts increase the issue time of ternary instructions (the overwhelmingly most common of which is MAD) by roughly 50%, leading to reduced ALU throughput. This pass attempts to minimize the number of bank conflicts by rearranging the layout of the GRF space post-register allocation. It's in general not possible to eliminate all of them without introducing extra copies, which are typically more expensive than the bank conflict itself. In a shader-db run on SKL this helps roughly 46k shaders: total conflicts in shared programs: 1008981 -> 600461 (-40.49%) conflicts in affected programs: 816222 -> 407702 (-50.05%) helped: 46234 HURT: 72 The running time of shader-db itself on SKL seems to be increased by roughly 2.52%±1.13% with n=20 due to the additional work done by the compiler back-end. On earlier generations the pass is somewhat less effective in relative terms because the hardware incurs a bank conflict anytime the last two sources of the instruction are duplicate (e.g. while trying to square a value using MAD), which is impossible to avoid without introducing copies. E.g. for a shader-db run on SNB: total conflicts in shared programs: 944636 -> 623185 (-34.03%) conflicts in affected programs: 853258 -> 531807 (-37.67%) helped: 31052 HURT: 19 And on BDW: total conflicts in shared programs: 1418393 -> 987539 (-30.38%) conflicts in affected programs: 1179787 -> 748933 (-36.52%) helped: 47592 HURT: 70 On SKL GT4e this improves performance of GpuTest Volplosion by 3.64% ±0.33% with n=16. NOTE: This patch intentionally disregards some i965 coding conventions for the sake of reviewability. This is addressed by the next squash patch which introduces an amount of (for the most part boring) boilerplate that might distract reviewers from the non-trivial algorithmic details of the pass. The following patch is squashed in: SQUASH: intel/fs/bank_conflicts: Roll back to the nineties. Acked-by: Matt Turner <mattst88@gmail.com>	2017-12-07 15:56:06 -08:00
Eric Engestrom	4cba39331d	meson: add dep_thread to every lib that includes threads.h Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141 Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-12-07 17:29:42 +00:00
Fredrik Höglund	5e1cb16768	anv: fix a case statement in GetMemoryFdPropertiesKHR The handle type in the case statement is supposed to be VK_EXTERNAL_- MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT. Fixes: `ab18e8e59b` ("anv: Implement VK_EXT_external_memory_dma_buf") Signed-off-by: Fredrik Höglund <fredrik@kde.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 20:04:39 +01:00
Jose Maria Casanova Crespo	a1e257a5bf	i965/fs: Use untyped_surface_read for 16-bit load_ssbo SSBO loads were using byte_scattered read messages as they allow reading 16-bit size components. byte_scattered messages can only operate one component at a time so we needed to emit as many messages as components. But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the untyped_surface_read message to read pairs of 16-bit components using only one message. Once each pair is read it is unshuffled to return the proper 16-bit components. vec3 case is assimilated to vec4 but the 4th component is ignored. 16-bit scalars are read using one byte_scattered_read message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using unshuffle 16 reads (Chema Casanova) v3: Use W and D types insead of HF and F in shuffle to avoid rounding erros (Jason Ekstrand) Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand) v4: Use subscript insead of chaging type and stride (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	ce2e572c4c	i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg Currently, we use byte-scattered write messages for storing 16-bit into an SSBO. This is because untyped surface messages have a fixed 32-bit size. This patch optimizes these 16-bit writes by combining 2 values (e.g, two consecutive components aligned with 32-bits) into a 32-bit register, packing the two 16-bit words. 16-bit single component values will continue to use byte-scattered write messages. The same will happens when the first consecutive component is not aligned 32-bits. This optimization reduces the number of SEND messages used for storing 16-bit values potentially by 2 or 4, which cuts down execution time significantly because byte-scattered writes are an expensive operation as they only write a component for message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using shuffle 16 write and enable writes of 16bit vec4 with only one message of 32-bits. (Chema Casanova) v3: - Fix coding style (Eduardo Lima) - Reorganize code to avoid duplication. (Jason Ekstrand) - Include new comments to explain the length calculations to fix alignment issues of components. (Jason Ekstrand) - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand) v4: (Jason Ekstrand) - Reorganize 64-bit ssbo-writes to avoid using slots_per_component. - Comment about why suffle is needed when using byte_scattered_write. Signed-off-by: Eduardo Lima <elima@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	66ce6ce78f	anv: Enable SPV_KHR_16bit_storage and VK_KHR_16bit_storage for SSBO/UBO Enables SPV_KHR_16bit_storage on gen 8+. VK_KHR_16bit_storage is enabled for SSBO/UBO using the VK_KHR_get_physical_device_properties2 functionality to expose if the extension is supported or not. v2: update due rebase against master (Alejandro) v3: (Jason Ekstrand) - Move this patch up in VK_KHR_16bit_storage series enabling only storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess. - Only expose VK_KHR_16bit_storage on Gen8+ v4: (Jason Ekstrand) - Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage enablement for SSBO/UBO. Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jason Ekstrand	3282309f74	i965/fs: Enables 16-bit load_ubo with sampler load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit surface format defined. So when reading 16-bit components with the sampler we need to unshuffle two 16-bit components from each 32-bit component. Using the sampler avoids the use of the byte_scattered_read message that needs one message for each component and is supposed to be slower. v2: (Jason Ekstrand) - Simplify component selection and unshuffling for different bitsizes - Remove SKL optimization of reading only two 32-bit components when reading 16-bits types. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	3db31c0b06	i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components This helpers are used to load/store 16-bit types from/to 32-bit components. The functions shuffle_32bit_load_result_to_16bit_data and shuffle_16bit_data_for_32bit_write are implemented in a similar way than the analogous functions for handling 64-bit types. v1: Explain need of temporary in shuffle operations. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	fa4a9d63bb	i965/fs: Use byte scattered read for 16-bit load_ssbo Used to enable 16-bit reads at do_untyped_vector_read, that is used on the following intrinsics: * nir_intrinsic_load_shared * nir_intrinsic_load_ssbo v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: - Add bitsize to scattered read operation (Jason Ekstrand) - Remove implementation of 16-bit UBO read from this patch. - Avoid assertion at opt_algebraic caused by ADD of two IMM with offset with BRW_REGISTER_TYPE_UD type found on matrix tests. (Jose Maria Casanova) v4: (Jason Ekstrand) - Put if case for 16-bits at the beginning of the if ladder. - Use type_sz(dest.type) * 8 as bit_size parameter for scattered read. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	c57a3f200d	i965/fs: Add byte scattered read message and fs support v2: Fix alignment style (Topi Pohjolainen) (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_read. v3: (Jason Ekstrand) - Use renamed brw_byte_scattered_data_element_from_bit_size method - Assert scattered read for Gen8+ and Haswell. - Use conditional expresion at components_read. - Include comment about params for scattered opcodes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	a4031bdfa9	i965/fs: Predicate byte scattered writes if needed While on Untyped Surface messages the bits of the execution mask are ANDed with the corresponding bits of the Pixel/Sample Mask, that is not the case for byte scattered writes. That is needed to avoid ssbo stores writing on helper invocations. So when that can affect, we load the sample mask, and predicate the send message. Note: the need for this patch was tested with a custom test. Right now the 16 bit storage CTS tests doesnt need this path in order to get a full pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	96f1926aab	i965/fs: Use byte_scattered_write on 16-bit store_ssbo We need to rely on byte scattered writes as untyped writes are 32-bit size. We could try to keep using 32-bit messages when we have two or four 16-bit elements, but for simplicity sake, we use the same message for any component number. We revisit this aproach in the follwing patches. v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: (Jason Ekstrand) - Include bit_size to scattered write message and remove namespace - specific for scattered messages. - Move comment to proper place. - Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo. (Jose Maria Casanova) - Take into account that get_nir_src returns now WORD types for 16-bit sources instead of DWORD. v4: (Jason Ekstrand) - Rename lenght variable to num_components. - Include assertions before emit_untyped_write. - Remove type_slot in favor of num_slot and first_slot. Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	f1a9936ee1	i965/fs: Add byte scattered write message and fs support v2: (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_write. v3: - Remove leftover newline (Topi Pohjolainen) - Rename brw_data_size to brw_scattered_data_element and use defines instead of an enum (Jason Ekstrand) - Assert scattered write for Gen8+ and Haswell (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	d038deaa40	i965/fs: Add remove_extra_rounding_modes optimization Although from SPIR-V point of view, rounding modes are attached to the operation/destination, on i965 it is a status, so we don't need to explicitly set the rounding mode if the one we want is already set. Taking into account that the default mode is RTE, one possible optimization would be optimize out the first RTE set for each block. For in order to work, we would need to take into account block interrelationships. At this point, it is not worth to complicate the optimization for such small gain. v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode (Curro) v3: Reset optimization for every block. (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	82fa4d45e7	i965/fs: Enable rounding mode on f2f16 ops By default we don't set the rounding mode. We only set round-to-near-even or round-to-zero mode if explicitly set from nir. v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode (Curro) v3: Use new helper brw_rnd_mode_from_nir_op (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	d6cd14f213	i965/fs: Define new shader opcode to set rounding modes Although it is possible to emit them directly as AND/OR on brw_fs_nir, having a specific opcode makes it easier to remove duplicate settings later. v2: (Curro) - Set thread control to 'switch' when using the control register - Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode. - Avoid magic numbers setting rounding mode field at control register. v3: (Curro) - Remove redundant and add missing whitespace lines. - Match printing instruction to IR opcode "rnd_mode" v4: (Topi Pohjolainen) - Fix code style. Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	ac8d4734f6	i965: Add support for control register Control register cr0 in i965 can be used to change the rounding modes in 32-bit to 16-bit floating-point conversions. From intel Skylake PRM, vol 07, section "Register and Tegister Regions", subsection "Control Register" (page 754): "Subregister cr0.0:ud contains normal operation control fields such as the floating-point mode ... " Floating-point Rounding mode is changed at bits 5:4 of cr0.0: "Rounding Mode. This field specifies the FPU rounding mode. It is initialized by Thread Dispatch." 00b = Round to Nearest or Even (RTNE) 01b = Round Up, toward +inf (RU) 10b = Round Down, toward -inf (RD) 11b = Round Toward Zero (RTZ)" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	5d5ee507fb	i965/fs: Handle 32-bit to 16-bit conversions Conversions to 16-bit need having aligment between the 16-bit and 32-bit types. So the conversion operations unpack 16-bit types to with an stride=2 and then applies a MOV with the conversion. v2 (Jason Ekstrand): - Avoid the general use of stride=2 for 16-bit register types. v3 (Topi Pohjolainen) - Code style fix (Jason Ekstrand) - Now nir_op_f2f16 was renamed to nir_op_f2f16_undef because conversion to f16 with undefined rounding is explicit Signed-off-by: Eduardo Lima <elima@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	a05b6f25bf	i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type Note that we don't remove the assert at i965/vec4. At this point half float support is only for the scalar backend. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	75a88d8567	i965: Support for 16-bit base types in helper functions v2: Fixed calculation of scalar size for 16-bit types. (Jason Ekstrand) v3: Fix coding style (Topi Pohjolainen) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Eduardo Lima <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	2d28ca7000	i965/vec4: Handle 16-bit types at type_size_xvec4 These types have similar vec4 sizes as their 32-bit counterparts. The vec4 backend doesn't support 16-bit types and probably never will, but this method is called by the scalar backend at fs_visitor::nir_setup_outputs(), so we still need to provide valid vec4 sizes for 16-bit types. In the future, something different should be implemented to avoid this dependency. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jason Ekstrand	8761a04d0d	anv: Add support for the variablePointers feature Not to be confused with variablePointersStorageBuffer which is the subset of VK_KHR_variable_pointers required to enable the extension. This means we now have "full" support for variable pointers. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2017-12-05 22:01:54 -08:00
Jason Ekstrand	32c859125b	anv: Handle nir_intrinsic_vulkan_resource_reindex Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2017-12-05 22:01:54 -08:00
Chad Versace	a932aee7a8	intel/isl: Declare private array as static const It's array isl_drm.c:modifier_info[] . Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2017-12-04 10:16:33 -08:00
Lionel Landwerlin	2ead8f1690	anv: query CS timestamp frequency from the kernel The reference value in gen_device_info isn't going to be acurate on Gen10+. We should query it from the kernel, which reads a couple of register to compute the actual value. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-12-04 18:05:20 +00:00
Jason Ekstrand	0a10e3770f	vulkan/wsi: Initialize individual WSI interfaces in wsi_device_init Now that we have anv_device_init/finish functions, there's no reason to have the individual driver do any more work than that. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	2e3e55110b	vulkan/wsi: Drop some unneeded cruft from the API This drops the unneeded callbacks struct as well as the queue_get_family callback we were using before we'd pulled QueuePresent inside. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	c1b1be5196	vulkan/wsi: Add wrappers for all of the surface queries This lets us move wsi_interface to wsi_common_private.h Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	82931dc007	vulkan/wsi: Drop the can_handle_different_gpu parameter from get_support Both anv and radv can handle prime now. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	516dfb34e1	vulkan/wsi: Add a helper for AcquireNextImage Unfortunately, due to the fact that AcquireNextImage does not take a queue, the ANV trick for triggering the fence won't work in general. We leave dealing with the fence up to the caller for now. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Dave Airlie	8ff49951c3	vulkan/wsi: move swapchain create/destroy to common code v2 (Jason Ekstrand): - Rebase - Alter the names of the helpers to better match the vulkan entrypoints - Use the helpers in anv Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	393aa3f6c9	vulkan/wsi: Move get_images into common code This moves bits out of all four corners (anv, radv, x11, wayland) and into the wsi common code. We also switch to using an outarray to ensure we get our return code right. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	1117f843fe	anv/wsi: Enable prime support Now that we're using the same common code as radv, we get prime support for free. Just enable it. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	ac95335b61	anv/wsi: Use the common QueuePresent code Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	e12688f365	vulkan/wsi: Do image creation in common code This uses the mock extension created in a previous commit to tell the driver that the image it's just been asked to create is, in fact, a window system image with whatever assumptions that implies. There was a lot of redundant code between the two drivers to do basically exactly the same thing. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	d50937f137	vulkan/wsi: Implement prime in a completely generic way Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	3dabb4011f	anv/image: Implement the wsi "extension" Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	a44744e01d	anv: Require a dedicated allocation for modified images This lets us set the BO tiling when we allocate the memory. This is required for GL to work properly. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	7d19e570e1	anv/image: Add a drm_format_mod field At the moment, this is always initialized to DRM_FORMAT_MOD_INVALID. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	ab18e8e59b	anv: Implement VK_EXT_external_memory_dma_buf This is a modified version of the patch originally sent by Chad Versace. The primary difference is that this version claims that OPQAUE_FD and DMA_BUF are compatible handle types. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	764fc1643c	vulkan/wsi: Add a wsi_device_init function This gives the opportunity to collect some function pointers if we'd like which will be very useful in future. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Daniel Stone	c1163f7b1c	vulkan/wsi: Add a wsi_image structure This is used to hold information about the allocated image, rather than an ever-growing function argument list. v2 (Jason Ekstrand): - Rename wsi_image_base to wsi_image Signed-off-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-12-04 10:04:19 -08:00
Dave Airlie	2cbeb32555	vulkan/wsi: use function ptr definitions from the spec. This just seems cleaner, and we may expand this in future. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-04 10:04:19 -08:00
Jason Ekstrand	e19c623128	spirv: Convert the supported_extensions struct to spirv_options This is a bit more general and lets us pass additional options into the spirv_to_nir pass beyond what capabilities we support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2017-12-02 08:09:11 -08:00
Rafael Antognolli	2919adffe9	intel/compiler: Implement WaClearTDRRegBeforeEOTForNonPS. The bspec describes: "WA: Clear tdr register before send EOT in all non-PS shader kernels mov(8) tdr0:ud 0x0:ud {NoMask}" Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-12-01 11:27:27 -08:00
Vadym Shovkoplias	cdb3eb7174	intel/blorp: Fix possible NULL pointer dereferencing Fix incomplete check of input params in blorp_surf_convert_to_uncompressed() which can lead to NULL pointer dereferencing. Fixes: `5ae8043fed` ("intel/blorp: Add an entrypoint for doing bit-for-bit copies") Fixes: `f395d0abc8` ("intel/blorp: Internally expose surf_convert_to_uncompressed") Reviewed-by: Emil Velikov <emli.velikov@collabora.com> Reviewed-by: Andres Gomez <agomez@igalia.com>	2017-11-30 16:20:05 +02:00
Vinson Lee	8c1e4b1afc	anv: Check if memfd_create is already defined. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103909 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-11-30 01:36:46 -08:00
Iago Toral Quiroga	8620f7ebbc	i965/vec4: use a temp register to compute offsets for pull loads 64-bit pull loads are implemented by emitting 2 separate 32-bit pull load messages, where the second message loads from an offset at +16B. That addition of 16B to the original offset should not alter the original offset register used as source for the pull load instruction though, since the compiler might use that same offset register in other instructions (for example, for other pull loads in the shader code that take that same offset as reference). If the pull load is 32-bit then we only need to emit one message and we don't need to do offset calculations, but in that case the optimizer should be able to drop the redundant MOV. Fixes the following test on Haswell: KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components Reviewed-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007	2017-11-30 07:57:53 +01:00
Kenneth Graunke	3d68329a65	i965: Move perf_debug and WARN_ONCE back to brw_context.h. These were moved to src/intel/common/gen_debug.h, but they are not common code. They assume that brw_context or gl_context variables exist, named brw or ctx. That isn't remotely true outside of i965. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-28 15:23:16 -08:00
Lionel Landwerlin	349712018b	i965: add a debug option to disable oa config loading This provides a good way to verify we haven't broken using the perf driver on older kernels (which don't have the oa config loading mechanism). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-28 13:34:04 +00:00
Jason Ekstrand	d7c8c7bd9d	intel/blorp: Drop blorp_resolve_ccs_attachment The only reason why we needed that version was because the Vulkan driver needed to be able to create the surface states so it could handle indirect clear colors. Now that blorp handles them natively, there's no need for the extra entrypoint. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-11-27 16:22:13 -08:00
Jason Ekstrand	5bc2849af9	anv: Let blorp handle indirect clear colors for CCS resolves Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-11-27 16:22:13 -08:00
Jason Ekstrand	34b95f88e6	anv: Move get_fast_clear_state_address into anv_private.h While we're at it, we break it into two nicely named functions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-11-27 16:22:13 -08:00
Jason Ekstrand	8915621882	intel/blorp: Take a range of layers in blorp_ccs_resolve Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-11-27 16:22:13 -08:00
Jason Ekstrand	67b676f0c5	intel/blorp: Add initial support for indirect clear colors Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-27 16:22:12 -08:00
Jason Ekstrand	86becfd2de	intel/blorp: Add fast-clear to the special case in MSAA resolves This doesn't go all the way of avoiding the txf_ms if it's fast-cleared, however it does at least make us only do it once. This should improve performance of MSAA resolves in the presence of lots of clear color. Without the patch, enabling fast-clears in the multisampling Sascha demo drops the framerate by about 10%. With this patch, enabling fast-clears increases the demo's framerate by 25%. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-11-27 16:22:11 -08:00
Jason Ekstrand	dc21c3937c	intel/blorp/blit: Rename blorp_nir_txf_ms_mcs That name is already taken by one of the helpers in blorp_nir_builder.h and, while we haven't moved the guts of blorp_blit.c there yet, we'd like to start using some things from that header. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-11-27 16:19:38 -08:00
Iago Toral Quiroga	f1873956db	i965/vec4: fix splitting of interleaved attributes When we split an instruction that reads an uniform value (vstride 0) we need to respect the vstride on the second half of the instruction (that is, the second half should read the same region as the first). We were doing this already, but we didn't account for stages that have interleaved input attributes which also have a vstride of 0 and need the same treatment. Fixes the following on Haswell: KHR-GL45.enhanced_layouts.varying_locations KHR-GL45.enhanced_layouts.varying_array_locations KHR-GL45.enhanced_layouts.varying_structure_locations Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Andres Gomez <agomez@igalia.com>	2017-11-24 09:24:06 +01:00
Eric Engestrom	1d3944aeeb	genxml: fix assert guards This removes a few hundred warnings on debug builds with asserts off. Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-23 09:44:16 +00:00
Lionel Landwerlin	d4c52c5408	anv: flag batch & instruction BOs for capture When the kernel support flagging our BO, let's mark batch & instruction BOs for capture so then can be included in the error state. v2: Only add EXEC_CAPTURE if supported (Kristian) v3: Fix operator precedence issue (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-22 22:53:27 +00:00
Lionel Landwerlin	118a8c7587	anv: setup BO flags at state_pool/block_pool creation This will allow to set the flags on any anv_bo created/filled from a state pool or block pool later. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-22 22:53:27 +00:00
Kristian H. Kristensen	24609377f9	intel/genxml: Add helpers for determining field type Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-21 11:15:06 -08:00
Matt Turner	beaea7abfa	i965/fs: Check ADD/MAD with immediates in satprop unit test The gen had to be changed from 4 to 6 so that we could test MAD, which is new on Gen6. mad_imm_float_neg_mov_sat tests the case fixed by the previous commit. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-11-21 10:13:07 -08:00
Matt Turner	a05af1f7b8	i965/fs: Handle negating immediates on MADs when propagating saturates MADs don't take immediate sources, but we allow them in the IR since it simplifies a lot of things. I neglected to consider that case. Fixes: `4009a9ead4` ("i965/fs: Allow saturate propagation to propagate negations into MADs.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103616 Reported-and-Tested-by: Ruslan Kabatsayev <b7.10110111@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-11-21 10:13:07 -08:00
Tapani Pälli	6236ffeb83	intel: fix disasm_info memory leaks Fixes: `4f82b17287` ("i965: Rewrite disassembly annotation code") Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-11-21 08:36:43 +02:00
Jason Ekstrand	1eab327ba7	i965: Stop including brw_cfg.h in brw_disasm_info.h The brw_disasm_info header is included by certain tools in order to get shader assembly from binaries so it's a semi-external header. Including brw_cfg.h also pulls in brw_shader.h so you end up getting quite a bit of our back-end compiler internals. Instead, make the couple of forward declarations we need and make the header more stand-alone. This fixes the meson build. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `4f82b17287`	2017-11-17 21:51:16 -08:00
Andres Gomez	1866f7aee5	i965: Correct disasm_info usage in eu_validate test Fixes: `4f82b17287` ("i965: Rewrite disassembly annotation code") Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-11-18 03:07:06 +02:00
Matt Turner	821ec473a8	i965: Rename intel_asm_annotation -> brw_disasm_info It was the only file named intel_* in the compiler. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-17 12:14:38 -08:00
Matt Turner	4f82b17287	i965: Rewrite disassembly annotation code The old code used an array to store each "instruction group" (the new, better name than the old overloaded "annotation"), and required a memmove() to shift elements over in the array when we needed to split a group so that we could add an error message. This was confusing and difficult to get right, not the least of which was because the array has a tail sentinel not included in .ann_count. Instead use a linked list, a data structure made for efficient insertion. Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-17 12:14:38 -08:00
Matt Turner	f80e97346b	i965: Simplify annotation_insert_error() Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-17 12:14:38 -08:00
Matt Turner	f4276ef7ef	i965: Move common code out of #ifdef I'm going to change the call in a later patch and with the difference in indentation level it wasn't immediately obvious that the calls were identical. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-17 12:14:38 -08:00
Jason Ekstrand	a07f7b2619	anv/cmd_buffer: Take bo_offset into account in fast clear state addresses Otherwise, if the image is not bound to the start of the buffer, we're going to be reading and writing its fast clear state in the wrong spot. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-17 11:32:21 -08:00
Jason Ekstrand	a6cc361e5f	anv/cmd_buffer: Advance the address when initializing clear colors Found by inspection Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-17 11:32:21 -08:00
Kenneth Graunke	f274687413	genxml: Fix PIPELINE_SELECT on G45/Ironlake. Original 965 sets bits 28:27 to 0, while G45 and later set it to 1. Note that the G45 docs are incorrect in this regard - see the DevCTG+ note in the Ironlake PRMs. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-16 11:01:50 -08:00
Kenneth Graunke	e48cc01be9	intel: Drop mtypes.h include from brw_compiler.h. This isn't necessary and causes trouble for a project I'm working on.	2017-11-15 09:37:32 -08:00
Kenneth Graunke	ff964916dc	i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code. We use the same hardware mechanism for both atomic counters and SSBO atomics, so there's really no benefit to maintaining separate code to handle each case. Instead, we can just use Rob's shiny new NIR pass to convert atomic_uints to SSBOs, and delete piles of code. The ssbo_start section of the binding table becomes a combined ABO and SSBO section, with ABOs first, then SSBOs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-15 09:37:32 -08:00
Anuj Phogat	5d8164c428	anv/gen10: Enable float blend optimization Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-11-14 13:23:18 -08:00
Anuj Phogat	72a239266b	intel/genxml: Add Cache Mode SubSlice Register to gen10.xml Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-11-14 13:23:18 -08:00
Anuj Phogat	aacf1943c0	anv/gen10: Implement WaSampleOffsetIZ workaround We already have this workaround in OpenGL driver. See Mesa commit `3cf4fe2219`. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: Nanley Chery <nanley.g.chery@intel.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com>	2017-11-14 13:23:18 -08:00
Matt Turner	a31d038208	Revert "intel/fs: Use a pure vertical stride for large register strides" This reverts commit `e8c9e65185`. With the actual bug fixed (by commit `6ac2d16901`), this is not necessary. I'm doubtful of its correctness in any case.	2017-11-14 11:24:08 -08:00
Matt Turner	6ac2d16901	i965/fs: Fix extract_i8/u8 to a 64-bit destination The MOV instruction can extract bytes to words/double words, and words/double words to quadwords, but not byte to quadwords. For unsigned byte to quadword, we can read them as words and AND off the high byte and extract to quadword in one instruction. For signed bytes, we need to first sign extend to word and the sign extend that word to a quadword. Fixes the following test on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-14 10:56:18 -08:00
Matt Turner	cfcfa0b9cd	i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK Fixes the following tests on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115	2017-11-14 10:56:18 -08:00
Jason Ekstrand	bc933d0e84	intel/blorp: Make the MOCS setting part of blorp_address This makes our MOCS settings significantly more flexible. Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 19:40:10 -08:00
Jason Ekstrand	deec84fd77	anv/blorp: Add a device parameter to blorp_surf_for_anv_image Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 19:40:09 -08:00
Jason Ekstrand	4639cc716e	intel/blorp: Use mocs.tex for depth stencil Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 19:39:57 -08:00
Kenneth Graunke	866158b4b6	intel/tools/error: Decode compute shaders. This is a bit more annoying than your average shader - we need to look at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then hop over to the instruction buffer to decode the program. Now that we store all the buffers before decoding, we can actually do this fairly easily. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 17:11:02 -08:00
Kenneth Graunke	7049c38655	intel/tools/error: Use do-while for field iterator loops. while loops skip the first field of the instruction/structure, which is not what the code intended. It works out because the field we're looking for doesn't happen to be first, but we ought to do it right regardless. Found while writing the next patch, where Kernel Start Pointer is the first field of INTERFACE_DESCRIPTOR_DATA. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 17:11:02 -08:00
Kenneth Graunke	8b749ee0ea	intel/tools/error: Decode shaders while decoding batch commands. This makes aubinator_error_decode's shader dumping work like aubinator. Instead of printing them after the fact, it prints them right inside the 3DSTATE_VS/HS/DS/GS/PS packet that references them. This saves you the effort of cross-referencing things and jumping back and forth. It also reduces a bunch of book-keeping, and eliminates the limitation that we could only handle 4096 programs. That code was also broken and failed to print any shaders if there were under 4096 programs. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 17:11:02 -08:00
Kenneth Graunke	4979bf2728	intel/tools/error: Save error state sections and decode them later. This lets us complete parsing and storing of each buffer's data before we begin decoding the batchbuffer. This makes it possible to inspect the state buffer and program buffer, so we can properly decode any indirect state or shader programs. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-13 17:11:02 -08:00
Kenneth Graunke	eb8ad56ed2	intel/tools/error: Fix null termination of ring name string. Ported from intel_error_decode. We don't want to run off the end. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-13 17:11:01 -08:00
Kenneth Graunke	ac17b38e79	intel/tools/error: Drop unused MAX_RINGS #define. Dead code. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-13 17:11:01 -08:00
Kenneth Graunke	596e860317	intel/tools/error: Refactor buffer matching, add more buffers. Based on a similar patch to intel_error_decode by Chris Wilson. While we're de-duplicating the gtt_offset calculation, we can simplify it to assume two hex digits are there - the kernel has done this since v4.6, and we already require error states from v4.10. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-13 17:10:51 -08:00
Kenneth Graunke	4bb119f00b	intel/tools/error: Only decode a few sections of error states. These three are the only we can reasonably decode with genxml. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-13 17:10:38 -08:00
Kenneth Graunke	00981e7c47	intel/tools/error: Drop unused parameters from decode() helper. Also change count from a pointer into a value. We were supposed to be resetting it to 0 (and failed to), but that's gone since we dropped the pre-ascii85 handling. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-13 17:10:38 -08:00
Kenneth Graunke	1898bf11a8	intel/tools/error: Drop support for non-ascii85 encoded error states. Error state files used to look like: render ring --- gtt_offset = 0x0e8f6000 00000000 : 69040000 00000004 : 79090000 ... 00007ffc : 00000000 --- ringbuffer = 0x00001000 There were thousands of lines between sections. The file format changed with Kernel 4.10, and now has a single ascii85-encoded line following each section heading. This is much easier to parse. There are a bunch of bugs in our handling of the old style format, where we'd decode the wrong data, at the wrong time. Fixing all of these is going to be a giant pain. It's also a lot of extra code complexity. In order to properly decode indirect state, or compute shaders, we'll also need to parse data in advance of decoding, which is going to be a giant pain with this ad-hoc "decode everywhere!" mentality. So, let's just drop support for the older file format. This unfortunately requires an error state generated by Kernel 4.10 or later. That's probably not the end of the world, as we encourage users to upgrade to the latest kernel when encountering GPU hangs anyway. It might be a giant pain for people with LTS kernels, though... Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-13 17:10:38 -08:00
Kenneth Graunke	53586f88d7	intel/tools/error: Do ascii85 decode first. The dashes "---" may occur within an ascii85 block, but only an ascii85 block starts with ':' or '~'. Ported from Chris Wilson's intel-gpu-tools commit: bceec7e1d8a160226b783c6344eae8cbf4ece144 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-13 17:10:38 -08:00
Dylan Baker	49fa074726	meson: Don't build intel shared components by default It's a neat idea, and still useful in some cases, but the intel common code is used by i965 and anvil only, this is a little clearer. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2017-11-13 13:43:15 -08:00
Jason Ekstrand	93200ea26d	aubinator: Don't skip the first field in each subgroup The previous iteration algorithm would advance the field pointer right after we advance the group. This meant that you would end up with skipping the first field of the group. In the common case, where the only field is a struct (e.g. 3DSTATE_VERTEX_BUFFERS), it would get skipped entirely. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 07:37:23 -08:00
Jason Ekstrand	74a9e51696	intel/genxml: Delete empty groups They serve no purpose other than to just fill empty space in the packet so each dword has something. Just disallowing empty groups is a bit easier on some of the tools. This does not change the generated packing headers in any way. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 07:37:23 -08:00
Jason Ekstrand	54a6f7eaca	anv: Don't crash on invalid heap sizes when the PCI ID is overriden	2017-11-13 07:37:23 -08:00
Kenneth Graunke	9a0465b3a3	intel/tools: Fix detection of enabled shader stages. We renamed "Function Enable" to "Enable", which broke our detection of whether shaders are enabled or not. So, we'd see a bunch of HS/DS packets with program offsets of 0, and think that was a valid TCS/TES. Fixes: `c032cae9ff` (genxml: Rename "Function Enable" to "Enable".) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-12 00:16:40 -08:00
Dylan Baker	854455498c	autotools: Set C++ visibility flags on Intel These flags are set for C sources, but not C++. This causes symbol visibility leaks from the C++ parts of the Intel compiler. Fixes: `700bebb958` ("i965: Move the back-end compiler to src/intel/compiler") Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-11-10 09:41:55 -08:00
Chad Versace	cd6f79a71d	anv/meson: Generate dev_icd.json I tested this in a setup where the builddir was outside of the srcdir. Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-11-09 16:29:33 -08:00
Chad Versace	b7441ef252	anv: Fix architecture in intel_icd.{arch}.json Use the host arch, not the target arch. In Meson and in recent Autotools, the host arch is where the binary will be used. The target arch is useful only when compiling a compiler. See: http://mesonbuild.com/Cross-compilation.html See: https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html Reported-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-11-09 16:29:31 -08:00
Chad Versace	2a4798ad98	anv: Refactor anv_GetImageSubresourceLayout() Its helper function, anv_surface_get_subresource_layout(), was not very helpful. So fold it into the main function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	69e3f0b02e	anv/image: Refactor choice of isl_tiling_flags_t Instead of choosing the tiling flags inside make_surface(), which is called once per aspect in a loop, and which chooses the same tiling for each aspect, choose the tiling flags exactly once before entering the aspect loop. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	7bb4387105	anv: Refactor anv_get_format_plane() - explicit unsupported The same local variable, 'plane_format', was returned on success and failure. Be more explicit in distinguishing the two cases: return 'plane_format' on success and return 'unsupported' on failure. This simplifies the diff in upcoming patches for VK_EXT_image_drm_format_modifier. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	3ee7f4bc2f	anv: Remove anv_physical_device_get_format_properties() Fold its body into its sole caller, anv_GetPhysicalDeviceFormatProperties(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00

... 3 4 5 6 7 ...

2842 Commits