KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Eric Engestrom	22c1657d05	util/os_file: always use the 'grow' mechanism Use fstat() only to pre-allocate a big enough buffer. This fixes a race where if the file grows between fstat() and read() we would be missing the end of the file, and if the file slims down read() would just fail. Fixes: `316964709e` "util: add os_read_file() helper" Reported-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 12:56:25 +01:00
Lionel Landwerlin	e04cf0b612	nir: lower_non_uniform_access: iterate over instructions safely This pass moves instructions around and adds control-flow in the middle of blocks. We need to use nir_foreach_instr_safe to ensure that we iterate over instructions correctly anyway. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bd5457641` ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 10:22:01 +01:00
Kenneth Graunke	752367b766	iris: Dodge more GLSL IR lowering This avoids some lower_instructions bits in st.	2019-05-15 19:44:21 -07:00
Jason Ekstrand	fce0214e94	intel/fs/live_variables: Do compute_start_end in BITSET_WORD chunks For a block with a contiguous chunk of 32 vars that don't need updating, this lets us skip 32 vars at a time. Also, by using bitscan, we only iterate for each set bit rather than testing them all one at a time. Looking at perf (with -O0 which is unfortunately necessary to get reasonable back-traces), this seems to cuts about 50-60% of the time spent in compute_start_end() which is, itself about 4-6% of the run-time. In the real world, with a release driver build, this cuts 1.34% off a full shader-db run. (I ran shader-db 5 times in each configuration). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-16 02:14:40 +00:00
Jason Ekstrand	b2d274c677	intel/fs/ra: Choose a spill reg before throwing away the graph Otherwise, we get an effectively random spill reg because we no longer have the information from RA to guide us. Also, a completely clean graph has undefined data in in_stack which is used for choosing the spill reg so it really is non-deterministic. Fixes: `e99081e76d` "intel/fs/ra: Spill without destroying the..." Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Jason Ekstrand	c19acf321c	intel/fs/ra: Add spill costs to the graph on-demand Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Jason Ekstrand	2c14e2b5bf	intel/fs/ra: Add a helper for discarding the interference graph Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Alyssa Rosenzweig	46494c3dc1	nir/algebraic: Remove problematic "optimization" This line is no longer relevant now that booleans are 1-bit, and in fact causes issues (infinite progress loop between algebraic optimizations and copy prop) with constant vector masks. No shader-db changes on Intel platforms (Jason). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2019-05-16 02:08:37 +00:00
Alyssa Rosenzweig	74ab80b92d	panfrost/midgard: Add load/store opcodes This commit adds a bunch of new load/store opcodes, largely related to OpenCL, as well as adjusting the name of existing opcodes to be more uniform. The immediate effect is compute shaders are substantially easier to interpret now. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:25:25 +00:00
Alyssa Rosenzweig	f73c0b73ec	panfrost/midgard: Enable integer constant inlining Midgard ALU features two types of constants: embedded constants (128-bit chunk, zero/one per schedule bundle) and inline constants (16-bit splattered into the op, second source if present). Inline constants are much more efficient from a space and scheduling freedom standpoint, so it's desirable to inline when possible. Now that integer ops are well understood and in use, we enable inlining of integers constants in addition to floats (which have been inlined since forever). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:20:41 +00:00
Alyssa Rosenzweig	8214aaa3c8	panfrost/midgard: Remove imov workaround The previous commit fixes the issue this patched around. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:20:41 +00:00
Alyssa Rosenzweig	0a13babdd8	panfrost/midgard: Set int outmod for ops writing integers By default, the "normal" output modifier is set on ALU ops. This is the correct default for float outputs -- for floats, it preserves the semantic value. Unfortunately, when used with integers, it does not preserve the bitstream encoding, causing misbehaviour. (It's an open question what happens when `normal` is used with integers -- does it apply some other transformation? or does it do floating point normalization/etc on the ints as if they were floats?). Instead, we default to the "clamp to integer" output modifier for ops writing integers. Semantically, this makes sense (clamping an integer to the nearest integer is the identity function). In the hardware with an integer opcode, this is the actual "normal". This fixes numerous sporadic and sometimes bizarre bugs relating to integers, especially integer moves. With this in place, we no longer care about the types involved; it's just bits on the wire again. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:20:30 +00:00
Alyssa Rosenzweig	81b1053d9b	panfrost: Set custom stride for textures when necessary From Gallium (and our) perspective, the stride of a BO is arbitrary. For internal buffers, we can make it something nice, but for imported linear buffers (e.g. EGL clients), we don't always have that luxury. To cope, we calculate the expected stride of a texture, compare it to the BO's actual reported stride, and if they differ, set the latter as a custom stride. Fixes rendering of windows not on tile boundaries (noticeable in Weston with es2gears_wayland, for instance). Also, this should fix stride issues with bufer reloading. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:16:36 +00:00
Alyssa Rosenzweig	cea9352059	panfrost/decode: Stride decoding With a special flag, texture descriptors can include custom stride(s). We haven't seen a case of this used for mipmaps/cubemaps, so it's not clear how that will be encoded, but this dumps correctly for single one-level 2D textures. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:15:37 +00:00
Alyssa Rosenzweig	d699ffbf0e	panfrost/decode: Futureproof texture dumping One field was not dumped for some reason. It's observed to be 0, but it's still good to have it available. Also, extra fields might be snuck in the bitmaps array (it's variable-lengthed at the end), and we want to guard against that possibility, so we dump a little more. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:15:37 +00:00
Marek Olšák	ccfcb9d818	ac: rename SI-CIK-VI to GFX6-GFX7-GFX8 Acked-by: Dave Airlie <airlied@redhat.com> We already use GFX9 and I don't want us to have confusing naming in the driver. GFXn naming is better from the driver perspective, because it's the real version of the gfx portion of the hw. Also, CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI. It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have nothing to do with GFXn and they have their own version numbers.	2019-05-15 20:54:10 -04:00
Marek Olšák	e5cc363f43	ac: add comments to chip enums Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (except GFX2 changes) Reviewed-by: Dave Airlie <airlied@redhat.com> (except <= GFX5 changes)	2019-05-15 20:54:10 -04:00
Anuj Phogat	a42163cbbc	compiler: Add lowering support for 64-bit saturate operations to software Fixes 7 Khronos GL CTS tests: KHR-GL45.gpu_shader_fp64.builtin.smoothstep_dvec{double, 2, 3, 4} KHR-GL45.gpu_shader_fp64.builtin.smoothstep_against_scalar_dvec{2, 3, 4} Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-15 23:30:30 +00:00
Kenneth Graunke	d305409db5	st/dri: Minor style fixes Trivial.	2019-05-15 14:49:14 -07:00
Chia-I Wu	659c5800e5	virgl: handle DONT_BLOCK and MAP_DIRECTLY Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY. Make virgl_resource_transfer_prepare return an enum instead of a bool for extensibility (e.g., instruct the callers to map differently). Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-15 20:51:28 +00:00
Chia-I Wu	e87186fc67	virgl: add virgl_resource_transfer_prepare virgl_resource_transfer_prepare should be called before mapping to prepare the resource. It does flush, readback, and wait as needed. virgl_res_needs_flush and virgl_res_needs_readback become internal helpers to the new function. There should be no externally visible change. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-15 20:51:28 +00:00
Chia-I Wu	cdcf38b98a	virgl: honor DISCARD_WHOLE_RESOURCE in virgl_res_needs_readback Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-15 20:51:28 +00:00
Chia-I Wu	a62ab178ce	virgl: clean up virgl_res_needs_readback Add comments and follow the coding style of virgl_res_needs_flush. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-15 20:51:28 +00:00
Lionel Landwerlin	391a836e8f	nir: fix lower_non_uniform_access pass Obviously missing the instruction insertion into the SSA list. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bd5457641` ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 18:15:20 +00:00
Alex Villacís Lasso	b2200514af	gbm: gbm_bo_get_handle_for_plane fallback to nonplanar handle Commit `f9567ab435` (gbm: Export a getter for per plane handles) contains an API version check that fails on i915 (API version 7 vs. check for minimum API version 13). Any client that migrates to the planar API will start failing on i915 (see https://gitlab.gnome.org/GNOME/mutter/issues/127 for mutter, and https://bugs.freedesktop.org/show_bug.cgi?id=108487 for weston). This commit adds a fallback for plane 0 when the API check fails and returns the non-planar handle in this scenario, making the call equivalent to gbm_bo_get_handle(). This is enough for weston 6.0.0 to start working again on an i915 system. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108487 Signed-off-by: Alex Villacís Lasso <a_villacis@palosanto.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-05-15 18:27:30 +01:00
Alyssa Rosenzweig	a9cef4f0e5	gallium: Add default check for PIPE_CAP_FRAGMENT_SHADER_INTERLOCK Fixes: `c704c0226` ("gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 21:34:49 -07:00
Andrii Kryvytskyi	eca53f00aa	iris: Check if resource has stencil before returning it Signed-off-by: Andrii Kryvytskyi <andrii.o.kryvytskyi@globallogic.com> Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 21:16:11 -07:00
Jordan Justen	49958c4b5d	i965/blorp: Set MOCS for gen11 in blorp_alloc_vertex_buffer v2: * Add build error for gen > 6 if MOCS is not set. (Lionel) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-05-14 19:57:01 -07:00
Kenneth Graunke	bb5db02bab	iris: Enable fragment shader interlock on Gen9+. There's some debate about whether we should support this on older hardware as well. Currently i965 turns it off on Gen8- though, so we follow suit. If this changes, we can update this as well. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-14 19:34:33 -07:00
Kenneth Graunke	c704c0226c	gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK. Corresponding to GL_ARB_fragment_shader_interlock and GL_NV_fragment_shader_interlock. Currently, only the NIR paths support this functionality, but someone could conceivably add it to TGSI too. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-14 19:34:29 -07:00
Dave Airlie	4efd04ab18	intel/compiler: use bitset instead of opencoding a 32-bit bitset. (v2) In the future I want to expand this to 128-bits, for vec16 support, so lets just put the code in place to use bitset ranges now. v2: just declare the bitset to be the max of what we should ever see and change assert to reflect it. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 07:10:34 +10:00
Dave Airlie	3b2c433167	intel/compiler: remove repeated bit_size / 8 in brw mem lowering pass. Just use a variable already. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 07:10:30 +10:00
Kenneth Graunke	646924cfa1	intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8 Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:30 -07:00
Kenneth Graunke	076159b40b	intel/compiler: Move ICP handle fetching into a helper function. This will be significantly different in 8_PATCH mode. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:28 -07:00
Kenneth Graunke	3d84fd29e8	intel/compiler: Don't repeat dispatch max fixing condition Having a single flag will keep both places in sync if the condition gets more complicated. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:27 -07:00
Kenneth Graunke	f0d52cf2b0	intel/compiler: Rename invocation_id_mask to instance_id_mask The payload field is actually "instance" (thread number), which is used to calculate the invocation ID. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:25 -07:00
Kenneth Graunke	d86260719e	intel/compiler: Refactor TCS invocation ID setup into a helper When we add 8_PATCH mode, this will get a bit more complex, so we may as well start by putting it in a helper function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:24 -07:00
Kenneth Graunke	381c2aded2	i965: Pass compiler to default key populators This lets us get devinfo and other misc. compiler settings. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:21 -07:00
Marek Olšák	6b0b8f132a	ac: use 1D GEPs for descriptors and constants just a cleanup Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-14 15:15:11 -04:00
Marek Olšák	67b4785958	mesa: fix _mesa_max_texture_levels for GL_TEXTURE_EXTERNAL_OES This helps fix: piglit/bin/ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto Fixes: `d88f3392ff` Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 15:15:11 -04:00
Eric Anholt	e5db87b00b	freedreno: Restore msm_drm.h to a pristine "make headers_install" copy. This diverged back in `f1374805a8` ("drm-uapi: use local files, not system libdrm") to point at drm-uapi's copy, which we don't need now that we're actually in drm-uapi. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-05-14 11:51:57 -07:00
Eric Anholt	18d11cb4dc	freedreno: Move msm_drm.h to the same spot as other DRM uapi. The new location matches other drivers, and has a README about the rules for updating it. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-05-14 11:51:55 -07:00
Ian Romanick	32d259713b	nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions The goal is to avoid having an extra MOV instruction to perform the saturate. Doing the subtraction first allows the saturate to be applied to the ADD instruction making the MOV unnecessary. Values generated in different block and values from non-ALU instructions (e.g., texture instructions) almost always need the extra MOV. Multiply instructions are restricted because doing this rearrangement can interfere with the generation of flrp and ffma instructions. v2: Now that the final method has been selected, squash three commits into one. All Intel platforms has similar results. (Ice Lake shown) total instructions in shared programs: 17223214 -> 17219386 (-0.02%) instructions in affected programs: 1524376 -> 1520548 (-0.25%) helped: 2686 HURT: 26 helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37% HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2 HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35% 95% mean confidence interval for instructions value: -1.46 -1.36 95% mean confidence interval for instructions %-change: -0.56% -0.50% Instructions are helped. total cycles in shared programs: 360811571 -> 360791896 (<.01%) cycles in affected programs: 103650214 -> 103630539 (-0.02%) helped: 1557 HURT: 675 helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16 helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64% HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14 HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49% 95% mean confidence interval for cycles value: -14.82 -2.81 95% mean confidence interval for cycles %-change: -0.50% -0.20% Cycles are helped. LOST: 2 GAINED: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:23 -07:00
Ian Romanick	a7f0c57673	nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1) v2: Fix copy-and-paste bug in a cmp b vs b cmp a cases. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224337 -> 17224269 (<.01%) instructions in affected programs: 13578 -> 13510 (-0.50%) helped: 68 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.31% max: 3.12% x̄: 0.84% x̃: 0.42% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.05% -0.63% Instructions are helped. total cycles in shared programs: 360826090 -> 360825137 (<.01%) cycles in affected programs: 94867 -> 93914 (-1.00%) helped: 58 HURT: 1 helped stats (abs) min: 2 max: 28 x̄: 17.74 x̃: 18 helped stats (rel) min: 0.08% max: 3.17% x̄: 1.39% x̃: 1.22% HURT stats (abs) min: 76 max: 76 x̄: 76.00 x̃: 76 HURT stats (rel) min: 2.86% max: 2.86% x̄: 2.86% x̃: 2.86% 95% mean confidence interval for cycles value: -19.53 -12.78 95% mean confidence interval for cycles %-change: -1.56% -1.08% Cycles are helped. No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:23 -07:00
Ian Romanick	281f20e26d	nir/algebraic: Strip double negatives from comparison sources All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224623 -> 17224337 (<.01%) instructions in affected programs: 32648 -> 32362 (-0.88%) helped: 148 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2 helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08% 95% mean confidence interval for instructions value: -1.97 -1.89 95% mean confidence interval for instructions %-change: -1.15% -1.00% Instructions are helped. total cycles in shared programs: 360828714 -> 360826090 (<.01%) cycles in affected programs: 347416 -> 344792 (-0.76%) helped: 148 HURT: 26 helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18 helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41% HURT stats (abs) min: 2 max: 337 x̄: 48.96 x̃: 6 HURT stats (rel) min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27% 95% mean confidence interval for cycles value: -23.78 -6.38 95% mean confidence interval for cycles %-change: -1.59% -0.79% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	45c7ff95fc	intel/compiler: Repeat nir_opt_algebraic_late A tiny bit of help seems to come from nir_copy_prop. Future patches will benefit from this change. Doing more copy propagation on the vec4 backend led to a disaster in hurt cycles. v2: Fix typo in comment. Noticed by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224634 -> 17224623 (<.01%) instructions in affected programs: 4586 -> 4575 (-0.24%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.36% -0.19% Instructions are helped. total cycles in shared programs: 360828542 -> 360828714 (<.01%) cycles in affected programs: 151159 -> 151331 (0.11%) helped: 49 HURT: 28 helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6 helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42% HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15 HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88% 95% mean confidence interval for cycles value: -13.48 17.95 95% mean confidence interval for cycles %-change: -0.69% 0.84% Inconclusive result (value mean confidence interval includes 0). Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13529544 -> 13529542 (<.01%) instructions in affected programs: 358 -> 356 (-0.56%) helped: 2 HURT: 0 total cycles in shared programs: 357290311 -> 357289678 (<.01%) cycles in affected programs: 178324 -> 177691 (-0.35%) helped: 48 HURT: 40 helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13 helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66% HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6 HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31% 95% mean confidence interval for cycles value: -18.28 3.89 95% mean confidence interval for cycles %-change: -1.01% 0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8159110 -> 8158980 (<.01%) instructions in affected programs: 22719 -> 22589 (-0.57%) helped: 65 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74% 95% mean confidence interval for instructions value: -2.06 -1.94 95% mean confidence interval for instructions %-change: -0.78% -0.68% Instructions are helped. total cycles in shared programs: 188609448 -> 188609214 (<.01%) cycles in affected programs: 1875852 -> 1875618 (-0.01%) helped: 109 HURT: 104 helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4 helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07% HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2 HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for cycles value: -1.95 -0.25 95% mean confidence interval for cycles %-change: -0.04% -0.01% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	d2a9ba03e3	Revert "nir: add late opt to turn inot/b2f combos back to bcsel" This reverts commit `7acc865226`. With these optimizations in place, the extra constant folding added in the next commit extends some live ranges of 0.0 and ±1.0 constants, and that causes several hundred shaders to have more spills and fills. I believe this optimization we made basically irrelevant by `7725d60938` "intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))". All Gen7.5+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225303 -> 17224634 (<.01%) instructions in affected programs: 879402 -> 878733 (-0.08%) helped: 679 HURT: 1 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.02 -0.95 95% mean confidence interval for instructions %-change: -0.26% -0.22% Instructions are helped. total cycles in shared programs: 360842595 -> 360828542 (<.01%) cycles in affected programs: 110443594 -> 110429541 (-0.01%) helped: 389 HURT: 265 helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28 helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11% HURT stats (abs) min: 1 max: 7614 x̄: 185.96 x̃: 48 HURT stats (rel) min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10% 95% mean confidence interval for cycles value: -75.65 32.67 95% mean confidence interval for cycles %-change: -0.49% -0.06% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12159 -> 12161 (0.02%) spills in affected programs: 13 -> 15 (15.38%) helped: 0 HURT: 1 total fills in shared programs: 25207 -> 25208 (<.01%) fills in affected programs: 25 -> 26 (4.00%) helped: 0 HURT: 1 Ivy Bridge total instructions in shared programs: 12082019 -> 12082013 (<.01%) instructions in affected programs: 1033 -> 1027 (-0.58%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.78% -0.45% Instructions are helped. total cycles in shared programs: 179849270 -> 179849157 (<.01%) cycles in affected programs: 4735 -> 4622 (-2.39%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18 helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36% 95% mean confidence interval for cycles value: -82.73 26.23 95% mean confidence interval for cycles %-change: -7.98% 2.28% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10882750 -> 10882748 (<.01%) instructions in affected programs: 266 -> 264 (-0.75%) helped: 2 HURT: 0 Iron Lake total cycles in shared programs: 188609440 -> 188609448 (<.01%) cycles in affected programs: 4320 -> 4328 (0.19%) helped: 0 HURT: 2 GM45 total cycles in shared programs: 129016868 -> 129016872 (<.01%) cycles in affected programs: 2302 -> 2306 (0.17%) helped: 0 HURT: 1 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3cb091f8b4	nir/algebraic: Eliminate a tautological compare The value-range tracking pass that is coming is not clever enough to know that the result of the ffma must be non-negative. Making it that smart will require quite a bit of work. It might be possible to add a special case that detects that a whole tree of fadd(fmul(fsat(a), fneg(fsat(a))), 1.0) cannot be negative. For cases when the comparison is used in the domain guard for a square-root (see nir/algebraic: Simplify fsqrt domain guard), the compare may be converted to a fmax. This patch also handles that case. All of the affected cases are in DiRT: Showdown. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225365 -> 17225303 (<.01%) instructions in affected programs: 40051 -> 39989 (-0.15%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.31% -0.22% Instructions are helped. total cycles in shared programs: 360842788 -> 360842595 (<.01%) cycles in affected programs: 1818081 -> 1817888 (-0.01%) helped: 29 HURT: 22 helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14 helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42% HURT stats (abs) min: 1 max: 108 x̄: 18.45 x̃: 7 HURT stats (rel) min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19% 95% mean confidence interval for cycles value: -14.48 6.91 95% mean confidence interval for cycles %-change: -0.71% 0.21% Inconclusive result (value mean confidence interval includes 0). No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	9725e45b3d	nir/algebraic: Simplify fsqrt domain guard All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228376 -> 17225365 (-0.02%) instructions in affected programs: 280732 -> 277721 (-1.07%) helped: 1072 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2 helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07% 95% mean confidence interval for instructions value: -2.92 -2.70 95% mean confidence interval for instructions %-change: -1.48% -1.37% Instructions are helped. total cycles in shared programs: 360935690 -> 360842788 (-0.03%) cycles in affected programs: 7838017 -> 7745115 (-1.19%) helped: 1569 HURT: 69 helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20 helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12% HURT stats (abs) min: 1 max: 2820 x̄: 98.22 x̃: 47 HURT stats (rel) min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31% 95% mean confidence interval for cycles value: -63.55 -49.89 95% mean confidence interval for cycles %-change: -3.33% -2.96% Cycles are helped. No changes on any other platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	e2ad047779	nir/search: Don't compare 8-bit or 1-bit constants with floats Without this, adding an algebraic rule like (('bcsel', ('flt', a, 0.0), 0.0, ...), ...), will cause assertion failures inside nir_src_comp_as_float in GTF-GL46.gtf21.GL.lessThan.lessThan_vec3_frag (and related tests) from the OpenGL CTS and shaders/closed/steam/witcher-2/511.shader_test from shader-db. All of these cases have some code that ends up like ('bcsel', ('flt', a, 0.0), 'b@1', ...) When the 'b@1' is tested, nir_src_comp_as_float fails because there's no such thing as a 1-bit float. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00

... 2 3 4 5 6 ...

111195 Commits All Branches Search

111195 Commits

All Branches