KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	7ceec21b76	intel/fs: Use a strided MOV instead of a conversion for load_* destinations In many cases, the compiler can just copy-prop the strided MOV whereas the conversion is a bit trickier. This cuts 5% of the instructions off of one particular Vulkan CTS test which does lots of load_ssbo. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	68a4c796d5	intel/fs: Properly stride NULL replacement regs in DCE This fixes some validation errors generated by certain D->W conversions but is likely not a full solution. Calculating an actual register stride is a far more complex problem in general and should probably be handled by the brw_fs_generator. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	110669c85c	st,i965: Stop looping on 64-bit lowering Now that the 64-bit lowering passes do a complete lowering in one go, we don't need to loop anymore. We do, however, have to ensure that int64 lowering happens after double lowering because double lowering can produce int64 ops. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	0ba508d7a3	nir,intel: Add support for lowering 64-bit nir_opt_extract_* We need this when doing full software 64-bit emulation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309 Fixes: `cbad201c2b` "nir/algebraic: Add missing 64-bit extract_[iu]8..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-15 16:08:37 -05:00
Jason Ekstrand	974fabe810	intel: Run the optimization loop before and after lowering int64 For bindless SSBO access, we have to do 64-bit address calculations. On ICL and above, we don't have 64-bit integer support so we have to lower the address calculations to 32-bit arithmetic. If we don't run the optimization loop before lowering, we won't fold any of the address chain calculations before lowering 64-bit arithmetic and they aren't really foldable afterwards. This cuts the size of the generated code in the compute shader in dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 by around 30%. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-13 02:59:28 +00:00
Andres Gomez	f4d2be03b1	intel/compiler: remove abandoned comments c8665005: ("intel/compiler: Don't always require precise lowering of flrp") forgot to remove some comments that didn't apply any more after the change. Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>	2019-07-12 16:15:20 +00:00
Ian Romanick	1259f6d802	nir: intel/vec4: Add flag to disable some algebraic optimizations A couple patches later in this series use the flag to avoid a few thousand shader-db regresions on all vec4 platforms. I'm not particularly enamored with the name of this flag. However, I suspect the Intel vec4 backend is the only backend that will benefit from it. Specifically, the cases where this helps are all cases where we want to prevent nir_opt_algebraic from rearranging instructions to create 3-source instructions, such as ffma and flrp, with additional immediate value or uniform sources. The earlier commit "intel/vec4: Try to emit a single load for multiple 3-src instruction operands" solves most of the problems caused by additional immediate values, but the restrictions on register strides that cause problems for uniforms and shader inputs persist. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	3a1fdca5ad	intel/vec4: Try to emit immediate sources for MOV Per the comment in vec4_visitor::nir_emit_load_const, further improvement is possible in this area. That case would be more complicated as I think we'd want to check that all users of the nir_load_const_instr result intended to use the value as float. No shader-db changes on any Gen8+ platform as these platforms do not use the vec4 backend. v2: Massive rebase on `eeebeb211f` ("intel/vec4: Try emitting non-scalar immediates"). This commit is about twice as helpful since `b04beaf41d` ("intel/vec4: Try both sources as candidates for being immediates"). Haswell and Ivy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13478598 -> 13474068 (-0.03%) instructions in affected programs: 589452 -> 584922 (-0.77%) helped: 2773 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.63 x̃: 1 helped stats (rel) min: 0.16% max: 5.66% x̄: 0.96% x̃: 0.83% 95% mean confidence interval for instructions value: -1.67 -1.60 95% mean confidence interval for instructions %-change: -0.98% -0.94% Instructions are helped. total cycles in shared programs: 376386916 -> 376369392 (<.01%) cycles in affected programs: 16871628 -> 16854104 (-0.10%) helped: 2293 HURT: 523 helped stats (abs) min: 2 max: 812 x̄: 13.80 x̃: 2 helped stats (rel) min: <.01% max: 10.18% x̄: 1.02% x̃: 0.36% HURT stats (abs) min: 2 max: 316 x̄: 26.99 x̃: 14 HURT stats (rel) min: <.01% max: 19.34% x̄: 2.15% x̃: 1.43% 95% mean confidence interval for cycles value: -7.87 -4.58 95% mean confidence interval for cycles %-change: -0.52% -0.34% Cycles are helped. Sandy Bridge total instructions in shared programs: 10860328 -> 10857675 (-0.02%) instructions in affected programs: 335907 -> 333254 (-0.79%) helped: 1639 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.62 x̃: 1 helped stats (rel) min: 0.10% max: 5.26% x̄: 0.86% x̃: 0.70% 95% mean confidence interval for instructions value: -1.67 -1.57 95% mean confidence interval for instructions %-change: -0.89% -0.84% Instructions are helped. total cycles in shared programs: 153942720 -> 153934120 (<.01%) cycles in affected programs: 5604818 -> 5596218 (-0.15%) helped: 1494 HURT: 97 helped stats (abs) min: 2 max: 256 x̄: 7.84 x̃: 2 helped stats (rel) min: 0.01% max: 6.62% x̄: 0.35% x̃: 0.18% HURT stats (abs) min: 2 max: 160 x̄: 32.02 x̃: 20 HURT stats (rel) min: 0.02% max: 3.37% x̄: 0.88% x̃: 0.56% 95% mean confidence interval for cycles value: -6.45 -4.36 95% mean confidence interval for cycles %-change: -0.32% -0.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8139378 -> 8137267 (-0.03%) instructions in affected programs: 265616 -> 263505 (-0.79%) helped: 1148 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.84 x̃: 1 helped stats (rel) min: 0.22% max: 4.76% x̄: 0.87% x̃: 0.62% 95% mean confidence interval for instructions value: -1.90 -1.78 95% mean confidence interval for instructions %-change: -0.90% -0.83% Instructions are helped. total cycles in shared programs: 188541756 -> 188537540 (<.01%) cycles in affected programs: 9807004 -> 9802788 (-0.04%) helped: 1143 HURT: 4 helped stats (abs) min: 2 max: 10 x̄: 3.70 x̃: 2 helped stats (rel) min: <.01% max: 3.01% x̄: 0.13% x̃: 0.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18% 95% mean confidence interval for cycles value: -3.80 -3.55 95% mean confidence interval for cycles %-change: -0.14% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	acd7796a07	intel/vec4: Try to emit a VF source in try_immediate_source This commit is also a pre-requisite for the next commit. No shader-db changes on any Gen8+ platform as these platforms do not use the vec4 backend. v2: Massive rebase on `eeebeb211f` ("intel/vec4: Try emitting non-scalar immediates"). This change is a lot less helpful since that commit landed (previously helped 1934 shaders on HSW) because, apparently, a lot of the cases helped by that commit were things like vector loads of { 1.0, 1.0, 1.0 } that were also helped by this commit. Haswell total instructions in shared programs: 13480095 -> 13478598 (-0.01%) instructions in affected programs: 229534 -> 228037 (-0.65%) helped: 1006 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1 helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09% 95% mean confidence interval for instructions value: -1.54 -1.43 95% mean confidence interval for instructions %-change: -1.15% -1.07% Instructions are helped. total cycles in shared programs: 376385734 -> 376386916 (<.01%) cycles in affected programs: 14101380 -> 14102562 (<.01%) helped: 941 HURT: 56 helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2 helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42% HURT stats (abs) min: 2 max: 618 x̄: 115.50 x̃: 32 HURT stats (rel) min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44% 95% mean confidence interval for cycles value: -2.06 4.43 95% mean confidence interval for cycles %-change: -0.47% -0.39% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 12048004 -> 12046589 (-0.01%) instructions in affected programs: 217072 -> 215657 (-0.65%) helped: 934 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1 helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11% 95% mean confidence interval for instructions value: -1.57 -1.46 95% mean confidence interval for instructions %-change: -1.18% -1.10% Instructions are helped. total cycles in shared programs: 180285854 -> 180287608 (<.01%) cycles in affected programs: 14103824 -> 14105578 (0.01%) helped: 871 HURT: 53 helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2 helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42% HURT stats (abs) min: 2 max: 618 x̄: 123.66 x̃: 32 HURT stats (rel) min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46% 95% mean confidence interval for cycles value: -1.60 5.39 95% mean confidence interval for cycles %-change: -0.46% -0.37% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10861227 -> 10860328 (<.01%) instructions in affected programs: 92969 -> 92070 (-0.97%) helped: 624 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95% 95% mean confidence interval for instructions value: -1.52 -1.36 95% mean confidence interval for instructions %-change: -1.09% -1.01% Instructions are helped. total cycles in shared programs: 153944316 -> 153942720 (<.01%) cycles in affected programs: 1640956 -> 1639360 (-0.10%) helped: 601 HURT: 15 helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2 helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08% HURT stats (abs) min: 2 max: 72 x̄: 36.13 x̃: 36 HURT stats (rel) min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00% 95% mean confidence interval for cycles value: -3.44 -1.74 95% mean confidence interval for cycles %-change: -0.18% -0.09% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8139924 -> 8139378 (<.01%) instructions in affected programs: 69776 -> 69230 (-0.78%) helped: 322 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1 helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54% 95% mean confidence interval for instructions value: -1.88 -1.51 95% mean confidence interval for instructions %-change: -0.85% -0.72% Instructions are helped. total cycles in shared programs: 188542864 -> 188541756 (<.01%) cycles in affected programs: 3031532 -> 3030424 (-0.04%) helped: 320 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2 helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -3.85 -3.07 95% mean confidence interval for cycles %-change: -0.06% -0.05% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	365b45d571	intel/vec4: Try to emit a single load for multiple 3-src instruction operands If a 3-source instruction uses immediate values 1.0 and -1.0, just load 1.0 into a register. Use the negation source modifier to get -1.0. This has trivial impact now, but it prevents a few thousand regressions on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)" All Gen6 and Gen7 platforms had similar results. (Haswell shown) total instructions in shared programs: 13487412 -> 13487406 (<.01%) instructions in affected programs: 541 -> 535 (-1.11%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.33% -0.97% Instructions are helped. total cycles in shared programs: 376402564 -> 376402500 (<.01%) cycles in affected programs: 10348 -> 10284 (-0.62%) helped: 10 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2 helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29% 95% mean confidence interval for cycles value: -11.72 0.08 95% mean confidence interval for cycles %-change: -1.20% -0.36% Inconclusive result (value mean confidence interval includes 0). No shader-db changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	6f6bc842f6	intel/vec4: Refactor operand fixing for ffma and flrp Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Caio Marcelo de Oliveira Filho	b390ff3517	intel/fs: Add support for SLM fence in Gen11 Gen11 SLM is not on L3 anymore, so now the hardware has two separate fences. Add a way to control which fence types to use. At this time, we don't have enough information in NIR to control the visibility of the memory being fenced, so for now be conservative and assume that fences will need a stall. With more information later we'll be able to reduce those. Fixes Vulkan CTS tests in ICL: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp The whole set of supported tests in dEQP-VK.memory_model.* group should be passing in ICL now. v2: Pass BTI around instead of having an enum. (Jason) Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets transformed into two. (Jason) List tests fixed. (Lionel) v3: For clarity, split the decision of which fences to emit from the emission code. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-11 08:29:32 -07:00
Jason Ekstrand	14781e2122	intel/compiler: Add a "base class" for program keys Right now, all keys have two things in common: a program string ID and a sampler_prog_key_data. I'd like to add another thing or two and need a place to put it. This commit adds a new brw_base_prog_key struct which contains those two common bits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-10 19:35:55 +00:00
Ian Romanick	dd2dc7e707	intel/vec4: Delete vec4_visitor::emit_lrp Effectivley unused since `dd7135d55d` ("intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5"). I had intended to remove this code as part of that series, but I forgot. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:11 -07:00
Ian Romanick	47c2aa5b48	intel/vec4: Reswizzle VF immediates too Previously, an instruction like mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F] would get rewritten as mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F] The latter does not produce the correct result. The VF immediate in the second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F]. This commit produces the former. Fixes: `1ee1d8ab46` ("i965/vec4: Reswizzle sources when necessary.") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Caio Marcelo de Oliveira Filho	45f5db5a84	intel/fs: Implement "demote to helper invocation" The "demote" intrinsic works like "discard" but don't change the control flow, allowing derivative operations to work. This is the semantics of D3D discard. The "is_helper_invocation" intrinsic will return true for helper invocations -- both the ones that started as helpers and the ones that where demoted. This is needed to avoid changing the behavior of gl_HelperInvocation which is an input (so not expected to change during shader execution). v2: Emit the discard jump and comment why it is safe. (Jason) Rework the is_helper_invocation() that was stomping f0.1. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 08:57:25 -07:00
Connor Abbott	6b28808b22	intel/nir: Extract add_const_offset_to_base Pretty much every driver using nir_lower_io_to_temporaries followed by nir_lower_io is going to want this. In particular, radv and radeonsi in the next commits. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Jason Ekstrand	fa869f45c8	intel/fs: Use nir_lower_interpolation on gen11+ On gen11, the removed the PLN instruction so we have to emit a pile of MAD to emulate it. We may as well do that in NIR so we can optimize and later schedule it. Shader-db results on Ice Lake: total instructions in shared programs: 17145644 -> 16556440 (-3.44%) instructions in affected programs: 11507454 -> 10918250 (-5.12%) helped: 35763 HURT: 42085 helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18 helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49% HURT stats (abs) min: 1 max: 248 x̄: 2.22 x̃: 2 HURT stats (rel) min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47% 95% mean confidence interval for instructions value: -7.67 -7.47 95% mean confidence interval for instructions %-change: -4.46% -4.29% Instructions are helped. total loops in shared programs: 4370 -> 4370 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 360624645 -> 368220857 (2.11%) cycles in affected programs: 269631244 -> 277227456 (2.82%) helped: 15583 HURT: 65874 helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32 helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44% HURT stats (abs) min: 1 max: 238638 x̄: 133.87 x̃: 20 HURT stats (rel) min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97% 95% mean confidence interval for cycles value: 67.42 119.09 95% mean confidence interval for cycles %-change: 3.61% 3.73% Cycles are HURT. total spills in shared programs: 8943 -> 8981 (0.42%) spills in affected programs: 1925 -> 1963 (1.97%) helped: 44 HURT: 14 total fills in shared programs: 21815 -> 21925 (0.50%) fills in affected programs: 3511 -> 3621 (3.13%) helped: 41 HURT: 18 LOST: 70 GAINED: 14 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Jason Ekstrand	2b79a9e5a5	intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Jason Ekstrand	8e7d066682	intel/fs: Actually implement the load_barycentric intrinsics If they never get used, dead code should clean them up. Also, we rework the at_offset and at_sample intrinsics so they return a proper vec2 instead of returning things in PLN layout. Fortunately, copy-prop is pretty good at cleaning this up and it doesn't result in any actual extra MOVs. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Sagar Ghuge	1e92e83856	intel/compiler: Emit ROR and ROL instruction v2: Reorder patch (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Sagar Ghuge	83fdec0f0d	intel/compiler: Enable the emission of ROR/ROL instructions v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support align16 mode (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Lionel Landwerlin	5847de6e9a	intel/compiler: don't use byte operands for src1 on ICL The simulator complains about using byte operands, we also have documentation telling us. Note that add operations on bytes seems to work fine on HW (like ADD). Using dwords operands with CMP & SEL fixes the following tests : dEQP-VK.spirv_assembly.type.vec.i8. v2: Drop the GLK changes (Matt) Add validator tests (Matt) v3: Drop GLK ref (Matt) Don't mix float/integer in MAD (Matt) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com> BSpec: 3017 Cc: <mesa-stable@lists.freedesktop.org>	2019-06-29 12:56:09 +00:00
Ian Romanick	b04beaf41d	intel/vec4: Try both sources as candidates for being immediates For some reason, when I first wrote try_immediate_source, I thought the sources had already been ordered so that the immediate value was the second source. That's rubbish. The generator assumes neither source is immediate, and it relies on later copy/constant propagation passes to do the reordering. For this reason, the changes to try_immediate_source have to go to some efforts to reorder the operands and tell the caller when it reordered them. The generator for comparison instructions uses this to determine when the comparison needs to change (e.g., from GT to LT). No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. Haswell total instructions in shared programs: 13484431 -> 13480500 (-0.03%) instructions in affected programs: 441138 -> 437207 (-0.89%) helped: 1883 HURT: 0 helped stats (abs) min: 1 max: 49 x̄: 2.09 x̃: 1 helped stats (rel) min: 0.07% max: 8.91% x̄: 1.10% x̃: 0.90% 95% mean confidence interval for instructions value: -2.19 -1.98 95% mean confidence interval for instructions %-change: -1.14% -1.06% Instructions are helped. total cycles in shared programs: 376420286 -> 376406400 (<.01%) cycles in affected programs: 15995668 -> 15981782 (-0.09%) helped: 1692 HURT: 219 helped stats (abs) min: 2 max: 764 x̄: 13.78 x̃: 4 helped stats (rel) min: <.01% max: 9.69% x̄: 0.69% x̃: 0.35% HURT stats (abs) min: 2 max: 516 x̄: 43.09 x̃: 22 HURT stats (rel) min: 0.02% max: 12.09% x̄: 2.30% x̃: 1.13% 95% mean confidence interval for cycles value: -9.70 -4.83 95% mean confidence interval for cycles %-change: -0.42% -0.28% Cycles are helped. total spills in shared programs: 23166 -> 23158 (-0.03%) spills in affected programs: 66 -> 58 (-12.12%) helped: 2 HURT: 0 total fills in shared programs: 34592 -> 34580 (-0.03%) fills in affected programs: 75 -> 63 (-16.00%) helped: 2 HURT: 0 Ivy Bridge total instructions in shared programs: 12051590 -> 12048513 (-0.03%) instructions in affected programs: 355911 -> 352834 (-0.86%) helped: 1481 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.08 x̃: 1 helped stats (rel) min: 0.07% max: 4.92% x̄: 1.08% x̃: 0.90% 95% mean confidence interval for instructions value: -2.17 -1.98 95% mean confidence interval for instructions %-change: -1.12% -1.04% Instructions are helped. total cycles in shared programs: 180319624 -> 180307642 (<.01%) cycles in affected programs: 15591028 -> 15579046 (-0.08%) helped: 1340 HURT: 174 helped stats (abs) min: 2 max: 764 x̄: 14.19 x̃: 2 helped stats (rel) min: <.01% max: 8.68% x̄: 0.64% x̃: 0.32% HURT stats (abs) min: 2 max: 518 x̄: 40.41 x̃: 14 HURT stats (rel) min: 0.02% max: 8.37% x̄: 1.59% x̃: 0.67% 95% mean confidence interval for cycles value: -10.85 -4.97 95% mean confidence interval for cycles %-change: -0.45% -0.31% Cycles are helped. All Gen6 and earlier platforms had simlar results. (Sandy Bridge shown) total instructions in shared programs: 10863159 -> 10861462 (-0.02%) instructions in affected programs: 157839 -> 156142 (-1.08%) helped: 715 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.37 x̃: 2 helped stats (rel) min: 0.23% max: 4.33% x̄: 1.07% x̃: 0.85% 95% mean confidence interval for instructions value: -2.53 -2.21 95% mean confidence interval for instructions %-change: -1.13% -1.02% Instructions are helped. total cycles in shared programs: 153957782 -> 153948778 (<.01%) cycles in affected programs: 3171648 -> 3162644 (-0.28%) helped: 696 HURT: 62 helped stats (abs) min: 2 max: 390 x̄: 15.72 x̃: 4 helped stats (rel) min: 0.02% max: 10.57% x̄: 0.57% x̃: 0.12% HURT stats (abs) min: 2 max: 300 x̄: 31.29 x̃: 2 HURT stats (rel) min: 0.11% max: 7.23% x̄: 0.83% x̃: 0.34% 95% mean confidence interval for cycles value: -15.65 -8.11 95% mean confidence interval for cycles %-change: -0.56% -0.36% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 18:13:18 -07:00
Ian Romanick	379cf3bb87	intel/vec4: Try immediate sources for dot products too No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. All Haswell and earlier platforms has similar results. (Haswell shown) total instructions in shared programs: 13484467 -> 13484431 (<.01%) instructions in affected programs: 8540 -> 8504 (-0.42%) helped: 33 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1 helped stats (rel) min: 0.31% max: 1.53% x̄: 0.49% x̃: 0.35% 95% mean confidence interval for instructions value: -1.19 -0.99 95% mean confidence interval for instructions %-change: -0.60% -0.38% Instructions are helped. total cycles in shared programs: 376420572 -> 376420286 (<.01%) cycles in affected programs: 56260 -> 55974 (-0.51%) helped: 26 HURT: 5 helped stats (abs) min: 2 max: 204 x̄: 11.85 x̃: 2 helped stats (rel) min: 0.11% max: 3.08% x̄: 0.39% x̃: 0.13% HURT stats (abs) min: 2 max: 6 x̄: 4.40 x̃: 6 HURT stats (rel) min: 0.03% max: 0.35% x̄: 0.24% x̃: 0.35% 95% mean confidence interval for cycles value: -22.91 4.45 95% mean confidence interval for cycles %-change: -0.56% -0.02% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 17:16:16 -07:00
Ian Romanick	eeebeb211f	intel/vec4: Try emitting non-scalar immediates Sometimes an instruction has a vector as a source, but all of the components have the same value. For example, vec3 32 ssa_16 = load_const (1.0, 1.0, 1.0) ... vec3 32 ssa_82 = fadd ssa_16, -ssa_81.xyz No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. Haswell total instructions in shared programs: 13487811 -> 13484467 (-0.02%) instructions in affected programs: 421981 -> 418637 (-0.79%) helped: 1859 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.04% max: 9.80% x̄: 1.04% x̃: 0.84% 95% mean confidence interval for instructions value: -1.85 -1.74 95% mean confidence interval for instructions %-change: -1.07% -1.00% Instructions are helped. total cycles in shared programs: 376423252 -> 376420572 (<.01%) cycles in affected programs: 14800970 -> 14798290 (-0.02%) helped: 1519 HURT: 329 helped stats (abs) min: 2 max: 462 x̄: 10.59 x̃: 4 helped stats (rel) min: 0.03% max: 16.73% x̄: 0.79% x̃: 0.36% HURT stats (abs) min: 2 max: 598 x̄: 40.74 x̃: 16 HURT stats (rel) min: <.01% max: 10.32% x̄: 2.56% x̃: 0.98% 95% mean confidence interval for cycles value: -3.53 0.63 95% mean confidence interval for cycles %-change: -0.30% -0.09% Inconclusive result (value mean confidence interval includes 0). total fills in shared programs: 34601 -> 34592 (-0.03%) fills in affected programs: 91 -> 82 (-9.89%) helped: 9 HURT: 0 Ivy Bridge total instructions in shared programs: 12053565 -> 12051626 (-0.02%) instructions in affected programs: 298103 -> 296164 (-0.65%) helped: 1228 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.58 x̃: 1 helped stats (rel) min: 0.04% max: 3.57% x̄: 0.91% x̃: 0.81% 95% mean confidence interval for instructions value: -1.63 -1.53 95% mean confidence interval for instructions %-change: -0.95% -0.88% Instructions are helped. total cycles in shared programs: 180322270 -> 180319922 (<.01%) cycles in affected programs: 14123840 -> 14121492 (-0.02%) helped: 1036 HURT: 195 helped stats (abs) min: 2 max: 462 x̄: 11.93 x̃: 2 helped stats (rel) min: 0.03% max: 14.05% x̄: 0.82% x̃: 0.35% HURT stats (abs) min: 2 max: 598 x̄: 51.33 x̃: 16 HURT stats (rel) min: <.01% max: 9.68% x̄: 3.02% x̃: 0.72% 95% mean confidence interval for cycles value: -4.92 1.10 95% mean confidence interval for cycles %-change: -0.35% -0.07% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10864286 -> 10863189 (-0.01%) instructions in affected programs: 159722 -> 158625 (-0.69%) helped: 724 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.52 x̃: 1 helped stats (rel) min: 0.10% max: 2.91% x̄: 0.79% x̃: 0.62% 95% mean confidence interval for instructions value: -1.58 -1.46 95% mean confidence interval for instructions %-change: -0.82% -0.75% Instructions are helped. total cycles in shared programs: 153967938 -> 153957926 (<.01%) cycles in affected programs: 1923186 -> 1913174 (-0.52%) helped: 654 HURT: 56 helped stats (abs) min: 2 max: 170 x̄: 20.00 x̃: 4 helped stats (rel) min: 0.03% max: 11.82% x̄: 0.89% x̃: 0.18% HURT stats (abs) min: 2 max: 390 x̄: 54.75 x̃: 32 HURT stats (rel) min: 0.05% max: 6.92% x̄: 3.09% x̃: 2.92% 95% mean confidence interval for cycles value: -17.42 -10.78 95% mean confidence interval for cycles %-change: -0.76% -0.40% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8142677 -> 8141721 (-0.01%) instructions in affected programs: 139511 -> 138555 (-0.69%) helped: 588 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.63 x̃: 1 helped stats (rel) min: 0.21% max: 4.39% x̄: 0.84% x̃: 0.46% 95% mean confidence interval for instructions value: -1.70 -1.55 95% mean confidence interval for instructions %-change: -0.89% -0.78% Instructions are helped. total cycles in shared programs: 188549394 -> 188547676 (<.01%) cycles in affected programs: 3171960 -> 3170242 (-0.05%) helped: 527 HURT: 0 helped stats (abs) min: 2 max: 18 x̄: 3.26 x̃: 2 helped stats (rel) min: <.01% max: 0.80% x̄: 0.08% x̃: 0.06% 95% mean confidence interval for cycles value: -3.49 -3.03 95% mean confidence interval for cycles %-change: -0.09% -0.07% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 17:16:06 -07:00
Lionel Landwerlin	836225840c	intel/compiler: fix derivative on y axis implementation This rewrites the ddy in EXECUTE_4 mode with a loop to make it more obvious what is going on and also sets the group each of the 4 threads in the groups are supposed to execute. Fixes the following CTS tests : dEQP-VK.glsl.derivate.dfdyfine.dynamic_* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `2134ea3800` ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")	2019-06-27 18:14:58 +00:00
Tapani Pälli	7a6e5a4bc3	intel/compiler: silence a warning of using different enum type Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-25 10:09:22 +03:00
Nicolai Hähnle	de8a919702	u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros The main motivation for this change is API ergonomics: most operations on dynarrays are really on elements, not on bytes, so it's weird to have grow and resize as the odd operations out. The secondary motivation is memory safety. Users of the old byte-oriented functions would often multiply a number of elements with the element size, which could overflow, and checking for overflow is tedious. With this change, we only need to implement the overflow checks once. The checks are cheap: since eltsize is a compile-time constant and the functions should be inlined, they only add a single comparison and an unlikely branch. v2: - ensure operations are no-op when allocation fails - in util_dynarray_clone, call resize_bytes with a compile-time constant element size v3: - fix iris, lima, panfrost Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 18:30:25 -04:00
Ian Romanick	39f4dc23a5	intel/fs: Mark source 0 of bcsel as needing Boolean resolve The other sources of the bcsel behave like the sources of an and or other logical operation. However, source zero behaves differently. It is evaluated as a Boolean, so it needs to be resolved. No shader-db changes, but the tests mentioned in the bug get a couple instructions added back. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-11 12:12:07 -07:00
Ian Romanick	1c30d26d89	intel/compiler: Treat b32csel as potentially producing a Boolean result for resolve analysis If the 2nd and 3rd source are both Boolean values, we can potentially avoid a resolve by only resolving the result of the b32csel. No changes on any Gen6+ Intel platform. v2: Use ?: instead of cast from bool to unsigned. Suggested by Caio. Iron Lake total instructions in shared programs: 8142729 -> 8142677 (<.01%) instructions in affected programs: 12890 -> 12838 (-0.40%) helped: 26 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.39% Instructions are helped. total cycles in shared programs: 188549632 -> 188549394 (<.01%) cycles in affected programs: 60754 -> 60516 (-0.39%) helped: 25 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8 helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for cycles value: -12.91 -5.40 95% mean confidence interval for cycles %-change: -0.84% -0.23% Cycles are helped. GM45 total instructions in shared programs: 5013119 -> 5013093 (<.01%) instructions in affected programs: 6764 -> 6738 (-0.38%) helped: 13 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.34% Instructions are helped. total cycles in shared programs: 128977804 -> 128977700 (<.01%) cycles in affected programs: 37738 -> 37634 (-0.28%) helped: 13 HURT: 0 helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26% 95% mean confidence interval for cycles value: -8.00 -8.00 95% mean confidence interval for cycles %-change: -0.36% -0.24% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:04:17 -07:00
Ian Romanick	0ba9497e66	intel/fs: Improve discard_if code generation Previously we would blindly emit an sequence like: mov(1) f0.1<1>UW g1.14<0,1,0>UW ... cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F / (+f0.1) cmp.z.f0.1(16) null<1>D g7<8,8,1>D 0D The first move sets the flags based on the initial execution mask. Later discard sequences contain a predicated compare that can only remove more SIMD channels. Often times the only user of the result from the first compare is the second compare. Instead, generate a sequence like mov(1) f0.1<1>UW g1.14<0,1,0>UW ... cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F / 15F / (+f0.1) cmp.ge.f0.1(8) null<1>F g5<8,8,1>F 0x41700000F / 15F */ If the results stored in g7 and f0.0 are not used, the comparison will be eliminated. This removes an instruction and potentially reduces register pressure. v2: Major re-write of the commit message (including fixing the assembly code). Suggested by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224434 -> 17198659 (-0.15%) instructions in affected programs: 2908125 -> 2882350 (-0.89%) helped: 18891 HURT: 5 helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1 helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02% HURT stats (abs) min: 9 max: 105 x̄: 51.40 x̃: 35 HURT stats (rel) min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56% 95% mean confidence interval for instructions value: -1.39 -1.34 95% mean confidence interval for instructions %-change: -1.79% -1.73% Instructions are helped. total cycles in shared programs: 361468458 -> 361170679 (-0.08%) cycles in affected programs: 38470116 -> 38172337 (-0.77%) helped: 16202 HURT: 1456 helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18 helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18% HURT stats (abs) min: 1 max: 5982 x̄: 87.51 x̃: 28 HURT stats (rel) min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64% 95% mean confidence interval for cycles value: -18.24 -15.49 95% mean confidence interval for cycles %-change: -2.26% -2.14% Cycles are helped. total spills in shared programs: 12147 -> 12176 (0.24%) spills in affected programs: 175 -> 204 (16.57%) helped: 8 HURT: 5 total fills in shared programs: 25262 -> 25292 (0.12%) fills in affected programs: 269 -> 299 (11.15%) helped: 8 HURT: 5 Haswell total instructions in shared programs: 13530316 -> 13502647 (-0.20%) instructions in affected programs: 2507824 -> 2480155 (-1.10%) helped: 18859 HURT: 10 helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1 helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41% HURT stats (abs) min: 5 max: 39 x̄: 25.70 x̃: 31 HURT stats (rel) min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31% 95% mean confidence interval for instructions value: -1.49 -1.44 95% mean confidence interval for instructions %-change: -2.42% -2.34% Instructions are helped. total cycles in shared programs: 377865412 -> 377639034 (-0.06%) cycles in affected programs: 40169572 -> 39943194 (-0.56%) helped: 15550 HURT: 1938 helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18 helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25% HURT stats (abs) min: 1 max: 4862 x̄: 89.17 x̃: 35 HURT stats (rel) min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75% 95% mean confidence interval for cycles value: -14.42 -11.47 95% mean confidence interval for cycles %-change: -2.05% -1.91% Cycles are helped. total spills in shared programs: 26769 -> 26814 (0.17%) spills in affected programs: 826 -> 871 (5.45%) helped: 9 HURT: 10 total fills in shared programs: 38383 -> 38425 (0.11%) fills in affected programs: 834 -> 876 (5.04%) helped: 9 HURT: 10 LOST: 5 GAINED: 10 Ivy Bridge total instructions in shared programs: 12079250 -> 12044139 (-0.29%) instructions in affected programs: 2409680 -> 2374569 (-1.46%) helped: 16135 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2 helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68% 95% mean confidence interval for instructions value: -2.21 -2.14 95% mean confidence interval for instructions %-change: -2.76% -2.67% Instructions are helped. total cycles in shared programs: 180116747 -> 179900405 (-0.12%) cycles in affected programs: 25439823 -> 25223481 (-0.85%) helped: 13817 HURT: 1499 helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18 helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97% HURT stats (abs) min: 1 max: 3684 x̄: 98.99 x̃: 52 HURT stats (rel) min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42% 95% mean confidence interval for cycles value: -15.68 -12.57 95% mean confidence interval for cycles %-change: -1.77% -1.63% Cycles are helped. LOST: 8 GAINED: 10 Sandy Bridge total instructions in shared programs: 10878990 -> 10863659 (-0.14%) instructions in affected programs: 1806702 -> 1791371 (-0.85%) helped: 13023 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1 helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10% 95% mean confidence interval for instructions value: -1.18 -1.17 95% mean confidence interval for instructions %-change: -1.68% -1.62% Instructions are helped. total cycles in shared programs: 154082878 -> 153862810 (-0.14%) cycles in affected programs: 20199374 -> 19979306 (-1.09%) helped: 12048 HURT: 510 helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18 helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52% HURT stats (abs) min: 1 max: 448 x̄: 54.39 x̃: 16 HURT stats (rel) min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17% 95% mean confidence interval for cycles value: -17.97 -17.08 95% mean confidence interval for cycles %-change: -1.84% -1.75% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 8155075 -> 8142729 (-0.15%) instructions in affected programs: 949495 -> 937149 (-1.30%) helped: 5810 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2 helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85% 95% mean confidence interval for instructions value: -2.14 -2.11 95% mean confidence interval for instructions %-change: -2.59% -2.48% Instructions are helped. total cycles in shared programs: 188584610 -> 188549632 (-0.02%) cycles in affected programs: 17274446 -> 17239468 (-0.20%) helped: 3881 HURT: 90 helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6 helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30% HURT stats (abs) min: 2 max: 10 x̄: 2.80 x̃: 2 HURT stats (rel) min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07% 95% mean confidence interval for cycles value: -9.35 -8.27 95% mean confidence interval for cycles %-change: -0.85% -0.77% Cycles are helped. GM45 total instructions in shared programs: 5019308 -> 5013119 (-0.12%) instructions in affected programs: 489028 -> 482839 (-1.27%) helped: 2912 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2 helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81% 95% mean confidence interval for instructions value: -2.14 -2.11 95% mean confidence interval for instructions %-change: -2.54% -2.39% Instructions are helped. total cycles in shared programs: 129002592 -> 128977804 (-0.02%) cycles in affected programs: 12669152 -> 12644364 (-0.20%) helped: 2759 HURT: 37 helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4 helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31% HURT stats (abs) min: 2 max: 10 x̄: 3.62 x̃: 4 HURT stats (rel) min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04% 95% mean confidence interval for cycles value: -9.53 -8.20 95% mean confidence interval for cycles %-change: -0.79% -0.70% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:04:13 -07:00
Ian Romanick	a288708506	intel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu This is the same as the need_dest parameter to prepare_alu_destination_and_sources. This allows us to not change the register that is expected to hold an result if an instruction is re-emitted. This is particularly a problem if the re-emitted instruction is a partial write. A later patch will use this feature. No shader-db changes on any Intel platform. v2: Don't do the Boolean resolve when there is no destination. If the ALU instruction didn't write a register, there's nothing to resolve. This replaces an earlier patch "intel/fs: Allocate dummy destination register when need_dest is false". Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:04:08 -07:00
Ian Romanick	e13a5c7d67	intel/fs: Allow cmod propagation across reads and writes of different flags This also helps a later patch (intel/fs: Improve discard_if code generation) on about 200 shaders. v2: Document that other instruction sequences are also valid in subtract_merge_with_compare_intervening_mismatch_flag_write. Suggested by Caio. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224438 -> 17224434 (<.01%) instructions in affected programs: 296 -> 292 (-1.35%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.04% -0.81% Instructions are helped. total cycles in shared programs: 361468455 -> 361468458 (<.01%) cycles in affected programs: 2862 -> 2865 (0.10%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31% HURT stats (abs) min: 3 max: 4 x̄: 3.50 x̃: 3 HURT stats (rel) min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51% 95% mean confidence interval for cycles value: -4.34 5.84 95% mean confidence interval for cycles %-change: -0.70% 0.90% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:03:45 -07:00
Ian Romanick	8030cb75c1	intel/fs: Fix flag_subreg handling in cmod propagation There were two errors. First, the pass could propagate conditional modifiers from an instruction that writes on flag register to an instruction that writes a different flag register. For example, cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F could be come cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F Second, if an instruction writes f0.1 has it's condition propagated, the modified instruction will incorrectly write flag f0.0. For example, linterp(16) vgrf6:F, g2:F, attr0:F cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F (-f0.1) discard_jump(16) (null):UD could become linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F (-f0.1) discard_jump(16) (null):UD None of these cases will occur currently. The only time we use f0.1 is for generating discard intrinsics. In all those cases, we generate a squence like: cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d (-f0.1) discard_jump(16) (null):UD Due to the mixed types and incompatible conditions, this sequence would never see any cmod propagation. The next patch will change this. No shader-db changes on any Intel platform. v2: Fix typo in comment in test case subtract_delete_compare_other_flag. Noticed by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:03:40 -07:00
Ian Romanick	2dd6013933	intel/fs: Add missing tests for cmod_propagate_not Tests like this should have been added in `4467040cb6` ("i965/fs: Propagate conditional modifiers from not instructions"). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:03:31 -07:00
Kenneth Graunke	c7d1b52a2c	nir: Combine lower_fmod16/32 back into a single lower_fmod. We originally had a single lower_fmod option. In commit `2ab2d2e5`, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit `ca31df6f`. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	edd45af9ba	nir: Drop lower_fmod64 option. nir_lower_doubles offers a wide variety of fp64 lowering, including lowering fmod@64. The version there also better handles imprecisions due to lowered frcp@64. Let's consolidate on one version. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Jason Ekstrand	811c05dfe6	intel/nir: Take nir_shaders in brw_nir_link_shaders Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_ is enabled, we can just take a single pointer and not a pointer to pointer. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-05 20:07:28 +00:00
Jason Ekstrand	bb67a99a2d	intel/nir: Stop returning the shader from helpers Now that NIR_TEST_* doesn't swap the shader out from under us, it's sufficient to just modify the shader rather than having to return in case we're testing serialization or cloning. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-05 20:07:28 +00:00
Jason Ekstrand	f4ef34f207	intel/fs: Add an UNDEF instruction to avoid excess live ranges With 8 and 16-bit types and anything where we have to use non-trivial strides registersto deal with restrictions, we end up with things that look like partial writes even though we don't care about any values in the register except those written by that instruction. This is particularly important when dealing with loops because liveness sees is_partial_write and the fact that an old version from a previous loop iteration may be valid at that point and extends all purely partially written values to the entire loop. This commit adds a new UNDEF instruction which does nothing (the generator doesn't emit anything) but which does a fake write to the register. This informs liveness that we don't care about any values before that point so it won't consider those registers to be falsely live. We can safely emit UNDEF instructions for all SSA values that come in from NIR and nearly all temporaries generated by various stages of the compiler. In particular, we need to insert UNDEF instructions when we handle region restrictions because the newly allocated registers are almost guaranteed to be partially written. No shader-db changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-04 14:27:30 -05:00
Jason Ekstrand	a84de3fb7c	intel/fs: Skip registers faster when setting spill costs This might be slightly faster since we're doing one read rather than two before we decide to skip. The more important reason, however, is because no_spill prevents us from re-spilling spill registers. In the new world in which we don't re-calculate liveness every spill, we may not have valid liveness for spill registers so we shouldn't even look their live ranges up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110825 Fixes: `e99081e76d` "intel/fs/ra: Spill without destroying the..." Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-04 14:37:56 +00:00
Sagar Ghuge	3016756398	intel/compiler: Fix assertions in brw_alu3 v2: Fix assertion for src1 (Ian Romanick) Fixes: `3b967e17` (intel/compiler: Avoid false positive assertions) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-03 23:14:34 -07:00
Ian Romanick	65df6122da	intel/compiler: Use compare rematerialization pass Almost all of the spill / fill benefit is in Deus Ex. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224438 -> 17196395 (-0.16%) instructions in affected programs: 1518658 -> 1490615 (-1.85%) helped: 1550 HURT: 3 helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2 helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45% HURT stats (abs) min: 5 max: 10 x̄: 6.67 x̃: 5 HURT stats (rel) min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32% 95% mean confidence interval for instructions value: -19.86 -16.26 95% mean confidence interval for instructions %-change: -1.19% -1.04% Instructions are helped. total cycles in shared programs: 361468455 -> 361288721 (-0.05%) cycles in affected programs: 197367688 -> 197187954 (-0.09%) helped: 990 HURT: 683 helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16 helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26% HURT stats (abs) min: 1 max: 12190 x̄: 905.14 x̃: 22 HURT stats (rel) min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47% 95% mean confidence interval for cycles value: -315.45 100.58 95% mean confidence interval for cycles %-change: -0.31% <.01% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12147 -> 8948 (-26.34%) spills in affected programs: 5433 -> 2234 (-58.88%) helped: 343 HURT: 0 total fills in shared programs: 25262 -> 21814 (-13.65%) fills in affected programs: 7771 -> 4323 (-44.37%) helped: 343 HURT: 3 LOST: 0 GAINED: 17 Ivy Bridge total instructions in shared programs: 12083517 -> 12081427 (-0.02%) instructions in affected programs: 540744 -> 538654 (-0.39%) helped: 786 HURT: 29 helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31% 95% mean confidence interval for instructions value: -2.83 -2.30 95% mean confidence interval for instructions %-change: -0.57% -0.47% Instructions are helped. total cycles in shared programs: 180153463 -> 180124798 (-0.02%) cycles in affected programs: 72597920 -> 72569255 (-0.04%) helped: 572 HURT: 249 helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13 helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26% HURT stats (abs) min: 1 max: 11060 x̄: 136.37 x̃: 10 HURT stats (rel) min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32% 95% mean confidence interval for cycles value: -96.22 26.39 95% mean confidence interval for cycles %-change: -0.43% -0.23% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3625 -> 3623 (-0.06%) spills in affected programs: 46 -> 44 (-4.35%) helped: 1 HURT: 0 total fills in shared programs: 4065 -> 4061 (-0.10%) fills in affected programs: 104 -> 100 (-3.85%) helped: 1 HURT: 0 LOST: 0 GAINED: 8 Sandy Bridge total instructions in shared programs: 10879656 -> 10878699 (<.01%) instructions in affected programs: 275167 -> 274210 (-0.35%) helped: 544 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25% 95% mean confidence interval for instructions value: -1.97 -1.55 95% mean confidence interval for instructions %-change: -0.43% -0.36% Instructions are helped. total cycles in shared programs: 154089096 -> 154081132 (<.01%) cycles in affected programs: 4422722 -> 4414758 (-0.18%) helped: 459 HURT: 214 helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8 helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14% HURT stats (abs) min: 1 max: 226 x̄: 19.99 x̃: 4 HURT stats (rel) min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09% 95% mean confidence interval for cycles value: -15.51 -8.15 95% mean confidence interval for cycles %-change: -0.31% -0.17% Cycles are helped. total spills in shared programs: 2880 -> 2876 (-0.14%) spills in affected programs: 636 -> 632 (-0.63%) helped: 2 HURT: 0 total fills in shared programs: 3161 -> 3157 (-0.13%) fills in affected programs: 1519 -> 1515 (-0.26%) helped: 2 HURT: 0 LOST: 0 GAINED: 2 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8157361 -> 8155067 (-0.03%) instructions in affected programs: 382491 -> 380197 (-0.60%) helped: 677 HURT: 0 helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2 helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42% 95% mean confidence interval for instructions value: -3.76 -3.01 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 188588292 -> 188583040 (<.01%) cycles in affected programs: 3155064 -> 3149812 (-0.17%) helped: 377 HURT: 13 helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6 helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12% HURT stats (abs) min: 2 max: 8 x̄: 5.85 x̃: 6 HURT stats (rel) min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04% 95% mean confidence interval for cycles value: -15.67 -11.27 95% mean confidence interval for cycles %-change: -0.45% -0.30% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-31 08:47:03 -07:00
Jason Ekstrand	9e403dc56e	intel/fs: Do a stalling MFENCE in endInvocationInterlock() Fixes: `939312702e` "i965: Add ARB_fragment_shader_interlock support" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Jason Ekstrand	859de4a748	intel/fs,vec4: Use g0 as the header for MFENCE We set header_present but then pass it some random garbage. Give it g0 instead. I'm not actually sure this does anything but g0 is the usual header data and this is what the windows driver does so it seems like a good idea. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Kenneth Graunke	6a9e39d44b	iris: Ask st to vectorize our IO. (Technically this is common code, but it doesn't affect i965 or anv.) Improves performance of GFXBench5/gl_tess_off on Skylake GT4e at 1080p by 9.3933% +/- 0.0305157% by eliminating all spilling in the GS. Improves performance of GFXBench5/gl_4_off (Car Chase) on Skylake GT4e at 1080p by 0.325208% +/- 0.0842233% (n=18). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-28 01:06:48 -07:00
Jason Ekstrand	f2dc0f2872	nir: Drop imov/fmov in favor of one mov instruction The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Rob Clark <robdclark@chromium.org>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	8ffbb54405	intel: Implement abs, neg, and sat in the back-end Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	4fde459563	intel/nir: Call alu_to_scalar one last time before out-of-ssa A few of our very late passes can end up generating vectors accidentally so we need to get rid of them. The only known case of this is the ffma peephole which generates fneg and fabs as vectors. Currently, they're not a problem because they get turned into fmov which the back-end compiler knows how to handle as a vector. That's about to change. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00

1 2 3 4 5 ...

973 Commits