KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jordan Justen	24b415270f	intel/vulkan: Hard code CS scratch_ids_per_subslice for Cherryview Ken suggested that we might be underallocating scratch space on HD 400. Allocating scratch space as though there was actually 8 EUs seems to help with a GPU hang seen on synmark CSDof. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-09 16:15:58 -08:00
Jordan Justen	06e3bd02c0	i965: Hard code CS scratch_ids_per_subslice for Cherryview Ken suggested that we might be underallocating scratch space on HD 400. Allocating scratch space as though there was actually 8 EUs seems to help with a GPU hang seen on synmark CSDof. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>	2018-03-09 16:15:34 -08:00
Marek Olšák	db495b8962	st/dri: fix OpenGL-OpenCL interop for GL_TEXTURE_BUFFER Tested by our OpenCL team. Fixes: `9c499e6759` "st/mesa: don't invoke st_finalize_texture & st_convert_sampler for TBOs" Acked-by: Alex Deucher <alexander.deucher@amd.com>	2018-03-09 16:33:31 -05:00
Marek Olšák	2bdb54bce7	radeonsi: add a workaround for GFX9 hang with init_config alignment Fixes: `75c5d25f0f` "radeonsi: align command buffer starting address to fix some Raven hangs" Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>	2018-03-09 16:28:29 -05:00
Marek Olšák	e99212e970	ac/gpu_info: print ib_start_alignment, add assertion	2018-03-09 16:28:29 -05:00
Greg V	e30a165be2	meson: Use system_has_kms_drm in default driver selection Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-03-09 10:02:44 -08:00
Eric Anholt	c57d5ea3bb	broadcom/vc4: Add an accelerated path to turn raster R8/RG88 into tiled. Drawing a 1080p YV12 video stream generated by MMAL goes from 10.5 FPS to 36.	2018-03-09 09:59:54 -08:00
Eric Anholt	cf170616da	gallium: Add a util_blitter path for using a custom VS and FS. Like the r600 paths to use other custom states, we pass in a couple of parameters to customize the innards of the blitter. It's up to the caller to wrap other state necessary for its shaders (for example, constant buffers for the uniforms the shader uses). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-09 09:59:54 -08:00
Eric Anholt	46a32e3d2e	broadcom/vc4: Allow binding non-zero constant buffers. We're going to use UBO loads for implementing YUV linear-to-T-format blits.	2018-03-09 09:59:54 -08:00
Eric Anholt	2725ab2b12	broadcom: Remove our defines of DRM_FORMAT_MOD_INVALID. The imported drm_fourcc.h handles it now.	2018-03-09 09:59:54 -08:00
Eric Anholt	a3a4c23dec	broadcom: Suppress compiler warnings about enum pipe_tex_filter.	2018-03-09 09:59:54 -08:00
Louis-Francis Ratté-Boulianne	3160cb86aa	egl/x11: Re-allocate buffers if format is suboptimal If PresentCompleteNotify event says the pixmap was presented with mode PresentCompleteModeSuboptimalCopy, it means the pixmap could possibly have been flipped instead if allocated with a different format/modifier. Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-03-09 17:47:14 +00:00
Louis-Francis Ratté-Boulianne	069fdd5f9f	egl/x11: Support DRI3 v1.1 Add support for DRI3 v1.1, which allows pixmaps to be backed by multi-planar buffers, or those with format modifiers. This is both for allocating render buffers, as well as EGLImage imports from a native pixmap (EGL_NATIVE_PIXMAP_KHR). Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-03-09 17:47:14 +00:00
Louis-Francis Ratté-Boulianne	61309c2a72	vulkan/wsi/x11: Return VK_SUBOPTIMAL_KHR for X11 When it is detected that a window could have been flipped but has been copied because of suboptimal format/modifier. The Vulkan client should then re-create the swapchain. Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-03-09 17:47:13 +00:00
Daniel Stone	c80c08e226	vulkan/wsi/x11: Add support for DRI3 v1.2 Adds support for multiple planes and buffer modifiers. v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers" v12: Multi-planar/modifier support is now DRI3 v1.2; also update release versions	2018-03-09 17:47:13 +00:00
Dylan Baker	7258be91c5	autotools: include all meson.build files Otherwise SWR cannot be built with meson from an autotools generated tarball, such as the 18.0.0-rc4 tarball. Fixes: `16bf813830` ("meson/swr: re-shuffle generated files") Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: George Kyriazis <george.kyriazis@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-03-09 08:15:04 -08:00
Michel Dänzer	2a4596a2f0	st/mesa: gl_program::info.system_values_read is a 64-bit-field We were dropping the upper 32 bits, which caused assertion failures in some compute shader piglit tests with radeonsi since the commit below. Fixes: `752e969703` ("compiler: Add two new system values for subgroups") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-09 16:52:11 +01:00
George Kyriazis	379e00dc27	swr/rast: Refactor memory gather operations Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:36:42 -06:00
George Kyriazis	3f7ce10b3e	swr/rast: Add KNOB_DISABLE_SPLIT_DRAW This is useful for archrast data collection. This greatly speeds up the post processing script since there is significantly less events generated. Finally, this is a simpler option to communicate to users than having them directly adjust MAX_PRIMS_PER_DRAW and MAX_TESS_PRIMS_PER_DRAW. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:36:30 -06:00
George Kyriazis	e0a4a25829	swr/rast: Add VPOPCNT Supports popcnt on vector masks (e.g. <8 x i1>) Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:36:23 -06:00
George Kyriazis	b56afe1a4f	swr/rast: Add tracking for stream out topology Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:36:14 -06:00
George Kyriazis	2f6ae8cfcd	swr/rast: Add split draw and other state information to DrawInfoEvent. Removed specific split draw events. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:36:07 -06:00
George Kyriazis	714093203e	swr/rast: Refactor api and worker event handlers. In the API event handler we want to share information between the core layer and the API. Specifically, around associating various ids with different kinds of events. For example, associate render pass id with draw ids, or command buffer ids with draw ids. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:35:59 -06:00
George Kyriazis	cfdd35beaf	swr/rast: Add support for generalized late and early z/stencil stats Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:35:52 -06:00
George Kyriazis	9e25f298eb	swr/rast: Rasterized Subspans stats support Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:35:47 -06:00
George Kyriazis	d78b28fc33	swr/rast: Added comment Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-03-09 09:34:55 -06:00
Eric Engestrom	e903a7b0bb	vulkan/wsi: clean up cleanup path Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Keith Packard <keithp@keithp.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-03-09 13:25:44 +00:00
Bas Nieuwenhuizen	a793e7899f	radv: Fix the autotools build take 2. Forgot to remove a word.... Fixes: `04ffabf17a` "radv: Fix autotools build."	2018-03-09 14:10:24 +01:00
Lucas Stach	1f55d06783	etnaviv: allow mixing different bit depths for color and depth surfaces Vivante hardware supports this just fine. There is no reason why this shouldn't be advertised as a valid combination. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2018-03-09 12:06:07 +01:00
Thierry Reding	6d4d46bca9	autotools: Add tegra to AM_DISTCHECK_CONFIGURE_FLAGS This allows the driver to be built on a make distcheck and makes sure that it properly builds when a distribution tarball is made. Suggested-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-03-09 11:48:22 +01:00
Thierry Reding	1755f608f5	tegra: Initial support Tegra K1 and later use a GPU that can be driven by the Nouveau driver. But the GPU is a pure render node and has no display engine, hence the scanout needs to happen on the Tegra display hardware. The GPU and the display engine each have a separate DRM device node exposed by the kernel. To make the setup appear as a single device, this driver instantiates a Nouveau screen with each instance of a Tegra screen and forwards GPU requests to the Nouveau screen. For purposes of scanout it will import buffers created on the GPU into the display driver. Handles that userspace requests are those of the display driver so that they can be used to create framebuffers. This has been tested with some GBM test programs, as well as kmscube and weston. All of those run without modifications, but I'm sure there is a lot that can be improved. Some fixes contributed by Hector Martin <marcan@marcan.st>. Changes in v2: - duplicate file descriptor in winsys to avoid potential issues - require nouveau when building the tegra driver - check for nouveau driver name on render node - remove unneeded dependency on libdrm_tegra - remove zombie references to libudev - add missing headers to C_SOURCES variable - drop unneeded tegra/ prefix for includes - open device files with O_CLOEXEC - update copyrights Changes in v3: - properly unwrap resources in ->resource_copy_region() - support vertex buffers passed by user pointer - allocate custom stream and const uploader - silence error message on pre-Tegra124 - support X without explicit PRIME Changes in v4: - ship Meson build files in distribution tarball - drop duplicate driver_tegra dependency Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-03-09 11:48:22 +01:00
Thierry Reding	2052dbdae3	nouveau: Add framebuffer modifier support This adds support for framebuffer modifiers to Nouveau. This will be used by the Tegra driver to share metadata about the format of buffers (such as the tiling mode or compression). Changes in v2: - remove unused parameters to nouveau_buffer_create() - move format modifier query code to nvc0 backend - restrict format modifiers to 2D textures - implement ->query_dmabuf_modifiers() Changes in v4: - add UAPI include path on meson builds Changes in v5: - remove unnecessary includes Acked-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-03-09 11:48:08 +01:00
Thierry Reding	b964cab80a	nouveau/nvc0: Extract common tile mode macro Add a new macro that can be used to extract the tiling mode from a tile_mode value. This is will be used to determine the number of GOBs used in block linear mode. Acked-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-03-09 11:47:54 +01:00
Thierry Reding	75bf489628	drm/tegra: Sanitize format modifiers The existing format modifier definitions were merged prematurely, and recent work has unveiled that the definitions are suboptimal in several ways: - The format specifiers, except for one, are not Tegra specific, but the names don't reflect that. - The number space is split into two, reserving 32 bits for some "parameter" which most of the modifiers are not going to have. - Symbolic names for the modifiers are not using the standard DRM_FORMAT_MOD_* prefix, which makes them awkward to use. - The vendor prefix NV is somewhat ambiguous. Fortunately, nobody's started using these modifiers, so we can still fix the above issues. Do so by using the standard prefix. Also, remove TEGRA from the name of those modifiers that exist on NVIDIA GPUs as well. In case of the block linear modifiers, make the "parameter" smaller (4 bits, though only 6 values are valid) and don't let that leak into any of the other modifiers. Finally, also use the more canonical NVIDIA instead of the ambiguous NV prefix. This is based on commit 268892cb63a822315921a8dab48ac3e4abf7dd03 from Linux v4.16-rc1. Acked-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-03-09 11:44:35 +01:00
Thierry Reding	ffc85cfac0	drm/fourcc: Fix fourcc_mod_code() definition Avoid a compiler warnings when the val parameter is an expression. This is based on commit 5843f4e02fbe86a59981e35adc6cabebee46fdc0 from Linux v4.16-rc1. Acked-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-03-09 11:44:35 +01:00
Bas Nieuwenhuizen	04ffabf17a	radv: Fix autotools build. Forgot it again .... Fixes: `b6347807a9` "radv: Generate icd files." Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-03-09 09:36:19 +01:00
Samuel Pitoiset	365850fd68	ac/nir: set number of channels for packed mrt exports Bit 0 enables VSRC0 (R in low bits, G high) and bit 2 enables VSRC1 (B in low bits, A high). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-09 09:28:20 +01:00
Bas Nieuwenhuizen	68201ab2da	radv: Update version to 1.1.70. Turns out they did not reset the patch number on release. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-09 07:53:39 +01:00
Bas Nieuwenhuizen	b6347807a9	radv: Generate icd files. If the api version is too low, the loader clamps the application requested version to the advertized version, which messes with which extensions are enabled. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-09 07:53:39 +01:00
Ian Romanick	6878c9aabc	nir: Don't i2b a value that is already Boolean A bunch of shaders have sequences like: i2b(u2i(floatBitsToUint(intBitsToFloat(x == y ? -1 : 0)))) Other optimizations (and NIR's typeless nature) reduce this to i2b(x == y) which is silly. Skylake total instructions in shared programs: 14498698 -> 14497948 (<.01%) instructions in affected programs: 74480 -> 73730 (-1.01%) helped: 277 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 2.71 x̃: 2 helped stats (rel) min: 0.04% max: 13.79% x̄: 1.45% x̃: 0.68% 95% mean confidence interval for instructions value: -3.35 -2.06 95% mean confidence interval for instructions %-change: -1.74% -1.16% Instructions are helped. total cycles in shared programs: 532015500 -> 531999238 (<.01%) cycles in affected programs: 5943878 -> 5927616 (-0.27%) helped: 251 HURT: 74 helped stats (abs) min: 1 max: 13149 x̄: 127.89 x̃: 14 helped stats (rel) min: 0.01% max: 17.31% x̄: 1.55% x̃: 0.53% HURT stats (abs) min: 1 max: 4550 x̄: 214.04 x̃: 15 HURT stats (rel) min: <.01% max: 44.43% x̄: 2.81% x̃: 0.33% 95% mean confidence interval for cycles value: -158.51 58.43 95% mean confidence interval for cycles %-change: -1.07% -0.04% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4753 -> 4735 (-0.38%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. Haswell and Broadwell had simliar results. (Broadwell shown) total instructions in shared programs: 14791877 -> 14791127 (<.01%) instructions in affected programs: 77326 -> 76576 (-0.97%) helped: 278 HURT: 1 helped stats (abs) min: 1 max: 32 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.04% max: 13.79% x̄: 1.42% x̃: 0.68% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.49% max: 0.49% x̄: 0.49% x̃: 0.49% 95% mean confidence interval for instructions value: -3.33 -2.05 95% mean confidence interval for instructions %-change: -1.70% -1.13% Instructions are helped. total cycles in shared programs: 558250067 -> 558252872 (<.01%) cycles in affected programs: 5806328 -> 5809133 (0.05%) helped: 235 HURT: 83 helped stats (abs) min: 1 max: 10630 x̄: 81.73 x̃: 16 helped stats (rel) min: 0.03% max: 18.58% x̄: 1.60% x̃: 0.51% HURT stats (abs) min: 1 max: 10590 x̄: 265.19 x̃: 20 HURT stats (rel) min: <.01% max: 15.28% x̄: 1.89% x̃: 0.54% 95% mean confidence interval for cycles value: -89.87 107.51 95% mean confidence interval for cycles %-change: -1.06% -0.32% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4735 -> 4717 (-0.38%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. total fills in shared programs: 83111 -> 83110 (<.01%) fills in affected programs: 28 -> 27 (-3.57%) helped: 1 HURT: 0 Ivy Bridge total instructions in shared programs: 11774173 -> 11773436 (<.01%) instructions in affected programs: 70819 -> 70082 (-1.04%) helped: 267 HURT: 0 helped stats (abs) min: 1 max: 48 x̄: 2.76 x̃: 2 helped stats (rel) min: 0.21% max: 19.51% x̄: 1.57% x̃: 0.63% 95% mean confidence interval for instructions value: -3.51 -2.01 95% mean confidence interval for instructions %-change: -1.94% -1.21% Instructions are helped. total cycles in shared programs: 257153833 -> 257148932 (<.01%) cycles in affected programs: 585341 -> 580440 (-0.84%) helped: 167 HURT: 100 helped stats (abs) min: 1 max: 1327 x̄: 44.89 x̃: 16 helped stats (rel) min: 0.04% max: 26.54% x̄: 2.41% x̃: 0.88% HURT stats (abs) min: 1 max: 200 x̄: 25.95 x̃: 16 HURT stats (rel) min: 0.04% max: 9.81% x̄: 1.34% x̃: 0.65% 95% mean confidence interval for cycles value: -33.25 -3.46 95% mean confidence interval for cycles %-change: -1.47% -0.54% Cycles are helped. total loops in shared programs: 3416 -> 3398 (-0.53%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. LOST: 2 GAINED: 0 Sandy Bridge total instructions in shared programs: 10499306 -> 10499094 (<.01%) instructions in affected programs: 6051 -> 5839 (-3.50%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 4.93 x̃: 2 helped stats (rel) min: 0.39% max: 12.90% x̄: 4.29% x̃: 2.45% 95% mean confidence interval for instructions value: -7.66 -2.20 95% mean confidence interval for instructions %-change: -5.47% -3.12% Instructions are helped. total cycles in shared programs: 145862568 -> 145861370 (<.01%) cycles in affected programs: 61733 -> 60535 (-1.94%) helped: 36 HURT: 2 helped stats (abs) min: 16 max: 66 x̄: 36.61 x̃: 35 helped stats (rel) min: 0.45% max: 17.31% x̄: 4.92% x̃: 2.81% HURT stats (abs) min: 18 max: 102 x̄: 60.00 x̃: 60 HURT stats (rel) min: 1.10% max: 1.85% x̄: 1.48% x̃: 1.48% 95% mean confidence interval for cycles value: -41.28 -21.77 95% mean confidence interval for cycles %-change: -6.16% -3.00% Cycles are helped. total loops in shared programs: 1803 -> 1785 (-1.00%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. LOST: 4 GAINED: 0 No changes on Iron Lake of GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-03-08 15:26:26 -08:00
Ian Romanick	1583f49eaa	i965/vec4: Allow CSE on subset VF constant loads v2: Rewrite the code that generates the VF mask. Suggested by Ken. No changes on other platforms. Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13059891 -> 13059884 (<.01%) instructions in affected programs: 431 -> 424 (-1.62%) helped: 7 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -3.39% -0.71% Instructions are helped. total cycles in shared programs: 409260032 -> 409260018 (<.01%) cycles in affected programs: 4228 -> 4214 (-0.33%) helped: 7 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28% 95% mean confidence interval for cycles value: -2.00 -2.00 95% mean confidence interval for cycles %-change: -1.15% 0.07% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-08 15:26:26 -08:00
Ian Romanick	360899d457	i965/vec4: Relax writemask condition in CSE If the previously seen instruction generates more fields than the new instruction, still allow CSE to happen. This doesn't do much, but it also enables a couple more shaders in the next patch. It helped quite a bit in another change series that I have (at least for now) abandoned. v2: Add some extra comentary about the parameters to instructions_match. Suggested by Ken. No changes on Skylake, Broadwell, Iron Lake or GM45. Ivy Bridge and Haswell had similar results. (Ivy Bridge shown) total instructions in shared programs: 11780295 -> 11780294 (<.01%) instructions in affected programs: 302 -> 301 (-0.33%) helped: 1 HURT: 0 total cycles in shared programs: 257308315 -> 257308313 (<.01%) cycles in affected programs: 2074 -> 2072 (-0.10%) helped: 1 HURT: 0 Sandy Bridge total instructions in shared programs: 10506687 -> 10506686 (<.01%) instructions in affected programs: 335 -> 334 (-0.30%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-08 15:26:26 -08:00
Ian Romanick	52c7df1643	i965/fs: Merge CMP and SEL into CSEL on Gen8+ v2: Fix several problems handling inverted predicates. Add a much bigger comment around the BRW_CONDITIONAL_NZ case. v3: Allow uniforms and shader inputs as sources for the original SEL and CMP instructions. This enables a LOT more shaders to receive CSEL merging (5816 vs 8564 on SKL). v4: Report progress. Broadwell and Skylake had similar results. (Broadwell shown) helped: 8527 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 1 helped stats (rel) min: 0.03% max: 17.80% x̄: 1.12% x̃: 0.70% 95% mean confidence interval for instructions value: -2.51 -2.36 95% mean confidence interval for instructions %-change: -1.15% -1.10% Instructions are helped. total cycles in shared programs: 559442317 -> 558288357 (-0.21%) cycles in affected programs: 372699860 -> 371545900 (-0.31%) helped: 6748 HURT: 1450 helped stats (abs) min: 1 max: 32000 x̄: 182.41 x̃: 12 helped stats (rel) min: <.01% max: 66.08% x̄: 3.42% x̃: 0.70% HURT stats (abs) min: 1 max: 2538 x̄: 53.08 x̃: 14 HURT stats (rel) min: <.01% max: 96.72% x̄: 3.32% x̃: 0.90% 95% mean confidence interval for cycles value: -179.01 -102.51 95% mean confidence interval for cycles %-change: -2.37% -2.08% Cycles are helped. LOST: 0 GAINED: 6 No changes on earlier platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3] Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-08 15:26:26 -08:00
Kenneth Graunke	70de61594d	i965/fs: Add infrastructure for generating CSEL instructions. v2 (idr): Don't allow CSEL with a non-float src2. v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt). Don't reset the access mode afterwards (suggested by Samuel and Matt). Add support for CSEL not modifying the flags to more places (requested by Matt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v3] Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-08 15:26:26 -08:00
Ian Romanick	54e8d2268d	nir: Narrow some dot product operations On vector platforms, this helps elide some constant loads. v2: Reorder the transformations. No changes on Broadwell or Skylake. Haswell total instructions in shared programs: 13093793 -> 13060163 (-0.26%) instructions in affected programs: 1277532 -> 1243902 (-2.63%) helped: 13216 HURT: 95 helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2 helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78% HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1 HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19% 95% mean confidence interval for instructions value: -2.57 -2.49 95% mean confidence interval for instructions %-change: -3.65% -3.54% Instructions are helped. total cycles in shared programs: 409580819 -> 409268463 (-0.08%) cycles in affected programs: 71730652 -> 71418296 (-0.44%) helped: 9898 HURT: 2352 helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16 helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50% HURT stats (abs) min: 2 max: 276 x̄: 23.25 x̃: 6 HURT stats (rel) min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97% 95% mean confidence interval for cycles value: -33.19 -17.80 95% mean confidence interval for cycles %-change: -4.50% -4.26% Cycles are helped. total fills in shared programs: 82059 -> 82052 (<.01%) fills in affected programs: 21 -> 14 (-33.33%) helped: 7 HURT: 0 Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown) total instructions in shared programs: 11811851 -> 11780605 (-0.26%) instructions in affected programs: 1155007 -> 1123761 (-2.71%) helped: 12304 HURT: 95 helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2 helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86% HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1 HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19% 95% mean confidence interval for instructions value: -2.56 -2.48 95% mean confidence interval for instructions %-change: -3.71% -3.59% Instructions are helped. total cycles in shared programs: 257618409 -> 257316805 (-0.12%) cycles in affected programs: 71999580 -> 71697976 (-0.42%) helped: 9155 HURT: 2380 helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16 helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62% HURT stats (abs) min: 2 max: 290 x̄: 21.14 x̃: 4 HURT stats (rel) min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33% 95% mean confidence interval for cycles value: -34.32 -17.97 95% mean confidence interval for cycles %-change: -4.55% -4.29% Cycles are helped. GM45 and Iron Lake had nearly identical results (Iron Lake shown) total instructions in shared programs: 7886750 -> 7879944 (-0.09%) instructions in affected programs: 373781 -> 366975 (-1.82%) helped: 3715 HURT: 47 helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1 helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06% HURT stats (abs) min: 1 max: 6 x̄: 2.55 x̃: 2 HURT stats (rel) min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35% 95% mean confidence interval for instructions value: -1.85 -1.77 95% mean confidence interval for instructions %-change: -2.91% -2.73% Instructions are helped. total cycles in shared programs: 178114636 -> 178095452 (-0.01%) cycles in affected programs: 7227666 -> 7208482 (-0.27%) helped: 3349 HURT: 301 helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4 helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63% HURT stats (abs) min: 2 max: 42 x̄: 9.13 x̃: 10 HURT stats (rel) min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50% 95% mean confidence interval for cycles value: -5.52 -4.99 95% mean confidence interval for cycles %-change: -0.81% -0.73% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]	2018-03-08 15:26:26 -08:00
Lionel Landwerlin	d10a39ebe0	i965: perf: consolidate unmapping oa perf bo outside accumulation Do this in one place outside the only caller of the accumulation function. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-08 23:05:29 +00:00
Lionel Landwerlin	fb921a2870	i965: perf: count number of accumlated reports This will be reused later. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-08 23:05:26 +00:00
Lionel Landwerlin	e4387faafb	i965: perf: reuse timescale base function from query We already have the same function in brw_queryobj.c Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-08 23:05:23 +00:00
Lionel Landwerlin	b71da26496	i965: perf: store sysfs device entry into context We want to reuse it later on. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-08 23:05:21 +00:00
Lionel Landwerlin	5742b17da1	i965: perf: store the hw_id of the context in the query Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-08 23:05:18 +00:00

... 5 6 7 8 9 ...

101135 Commits All Branches Search

101135 Commits

All Branches