Commit Graph

973 Commits

Author SHA1 Message Date
Jason Ekstrand 7ceec21b76 intel/fs: Use a strided MOV instead of a conversion for load_* destinations
In many cases, the compiler can just copy-prop the strided MOV whereas
the conversion is a bit trickier.  This cuts 5% of the instructions off
of one particular Vulkan CTS test which does lots of load_ssbo.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand 68a4c796d5 intel/fs: Properly stride NULL replacement regs in DCE
This fixes some validation errors generated by certain D->W conversions
but is likely not a full solution.  Calculating an actual register
stride is a far more complex problem in general and should probably be
handled by the brw_fs_generator.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand 110669c85c st,i965: Stop looping on 64-bit lowering
Now that the 64-bit lowering passes do a complete lowering in one go, we
don't need to loop anymore.  We do, however, have to ensure that int64
lowering happens after double lowering because double lowering can
produce int64 ops.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand 0ba508d7a3 nir,intel: Add support for lowering 64-bit nir_opt_extract_*
We need this when doing full software 64-bit emulation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309
Fixes: cbad201c2b "nir/algebraic: Add missing 64-bit extract_[iu]8..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-15 16:08:37 -05:00
Jason Ekstrand 974fabe810 intel: Run the optimization loop before and after lowering int64
For bindless SSBO access, we have to do 64-bit address calculations.  On
ICL and above, we don't have 64-bit integer support so we have to lower
the address calculations to 32-bit arithmetic.  If we don't run the
optimization loop before lowering, we won't fold any of the address
chain calculations before lowering 64-bit arithmetic and they aren't
really foldable afterwards.  This cuts the size of the generated code in
the compute shader in dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 by
around 30%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-13 02:59:28 +00:00
Andres Gomez f4d2be03b1 intel/compiler: remove abandoned comments
c8665005: ("intel/compiler: Don't always require precise lowering of flrp")
forgot to remove some comments that didn't apply any more after the
change.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
2019-07-12 16:15:20 +00:00
Ian Romanick 1259f6d802 nir: intel/vec4: Add flag to disable some algebraic optimizations
A couple patches later in this series use the flag to avoid a few
thousand shader-db regresions on all vec4 platforms.

I'm not particularly enamored with the name of this flag.  However, I
suspect the Intel vec4 backend is the only backend that will benefit
from it.  Specifically, the cases where this helps are all cases where
we want to prevent nir_opt_algebraic from rearranging instructions to
create 3-source instructions, such as ffma and flrp, with additional
immediate value or uniform sources.

The earlier commit "intel/vec4: Try to emit a single load for multiple
3-src instruction operands" solves most of the problems caused by
additional immediate values, but the restrictions on register strides
that cause problems for uniforms and shader inputs persist.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick 3a1fdca5ad intel/vec4: Try to emit immediate sources for MOV
Per the comment in vec4_visitor::nir_emit_load_const, further
improvement is possible in this area.  That case would be more
complicated as I think we'd want to check that all users of the
nir_load_const_instr result intended to use the value as float.

No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.

v2: Massive rebase on eeebeb211f ("intel/vec4: Try emitting non-scalar
immediates").  This commit is about twice as helpful since b04beaf41d
("intel/vec4: Try both sources as candidates for being immediates").

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13478598 -> 13474068 (-0.03%)
instructions in affected programs: 589452 -> 584922 (-0.77%)
helped: 2773
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.63 x̃: 1
helped stats (rel) min: 0.16% max: 5.66% x̄: 0.96% x̃: 0.83%
95% mean confidence interval for instructions value: -1.67 -1.60
95% mean confidence interval for instructions %-change: -0.98% -0.94%
Instructions are helped.

total cycles in shared programs: 376386916 -> 376369392 (<.01%)
cycles in affected programs: 16871628 -> 16854104 (-0.10%)
helped: 2293
HURT: 523
helped stats (abs) min: 2 max: 812 x̄: 13.80 x̃: 2
helped stats (rel) min: <.01% max: 10.18% x̄: 1.02% x̃: 0.36%
HURT stats (abs)   min: 2 max: 316 x̄: 26.99 x̃: 14
HURT stats (rel)   min: <.01% max: 19.34% x̄: 2.15% x̃: 1.43%
95% mean confidence interval for cycles value: -7.87 -4.58
95% mean confidence interval for cycles %-change: -0.52% -0.34%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10860328 -> 10857675 (-0.02%)
instructions in affected programs: 335907 -> 333254 (-0.79%)
helped: 1639
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.10% max: 5.26% x̄: 0.86% x̃: 0.70%
95% mean confidence interval for instructions value: -1.67 -1.57
95% mean confidence interval for instructions %-change: -0.89% -0.84%
Instructions are helped.

total cycles in shared programs: 153942720 -> 153934120 (<.01%)
cycles in affected programs: 5604818 -> 5596218 (-0.15%)
helped: 1494
HURT: 97
helped stats (abs) min: 2 max: 256 x̄: 7.84 x̃: 2
helped stats (rel) min: 0.01% max: 6.62% x̄: 0.35% x̃: 0.18%
HURT stats (abs)   min: 2 max: 160 x̄: 32.02 x̃: 20
HURT stats (rel)   min: 0.02% max: 3.37% x̄: 0.88% x̃: 0.56%
95% mean confidence interval for cycles value: -6.45 -4.36
95% mean confidence interval for cycles %-change: -0.32% -0.23%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8139378 -> 8137267 (-0.03%)
instructions in affected programs: 265616 -> 263505 (-0.79%)
helped: 1148
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.84 x̃: 1
helped stats (rel) min: 0.22% max: 4.76% x̄: 0.87% x̃: 0.62%
95% mean confidence interval for instructions value: -1.90 -1.78
95% mean confidence interval for instructions %-change: -0.90% -0.83%
Instructions are helped.

total cycles in shared programs: 188541756 -> 188537540 (<.01%)
cycles in affected programs: 9807004 -> 9802788 (-0.04%)
helped: 1143
HURT: 4
helped stats (abs) min: 2 max: 10 x̄: 3.70 x̃: 2
helped stats (rel) min: <.01% max: 3.01% x̄: 0.13% x̃: 0.06%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
95% mean confidence interval for cycles value: -3.80 -3.55
95% mean confidence interval for cycles %-change: -0.14% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick acd7796a07 intel/vec4: Try to emit a VF source in try_immediate_source
This commit is also a pre-requisite for the next commit.

No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.

v2: Massive rebase on eeebeb211f ("intel/vec4: Try emitting non-scalar
immediates").  This change is a lot less helpful since that commit
landed (previously helped 1934 shaders on HSW) because, apparently, a
lot of the cases helped by that commit were things like vector loads of
{ 1.0, 1.0, 1.0 } that were also helped by this commit.

Haswell
total instructions in shared programs: 13480095 -> 13478598 (-0.01%)
instructions in affected programs: 229534 -> 228037 (-0.65%)
helped: 1006
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09%
95% mean confidence interval for instructions value: -1.54 -1.43
95% mean confidence interval for instructions %-change: -1.15% -1.07%
Instructions are helped.

total cycles in shared programs: 376385734 -> 376386916 (<.01%)
cycles in affected programs: 14101380 -> 14102562 (<.01%)
helped: 941
HURT: 56
helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2
helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42%
HURT stats (abs)   min: 2 max: 618 x̄: 115.50 x̃: 32
HURT stats (rel)   min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44%
95% mean confidence interval for cycles value: -2.06 4.43
95% mean confidence interval for cycles %-change: -0.47% -0.39%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 12048004 -> 12046589 (-0.01%)
instructions in affected programs: 217072 -> 215657 (-0.65%)
helped: 934
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11%
95% mean confidence interval for instructions value: -1.57 -1.46
95% mean confidence interval for instructions %-change: -1.18% -1.10%
Instructions are helped.

total cycles in shared programs: 180285854 -> 180287608 (<.01%)
cycles in affected programs: 14103824 -> 14105578 (0.01%)
helped: 871
HURT: 53
helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2
helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42%
HURT stats (abs)   min: 2 max: 618 x̄: 123.66 x̃: 32
HURT stats (rel)   min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46%
95% mean confidence interval for cycles value: -1.60 5.39
95% mean confidence interval for cycles %-change: -0.46% -0.37%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10861227 -> 10860328 (<.01%)
instructions in affected programs: 92969 -> 92070 (-0.97%)
helped: 624
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95%
95% mean confidence interval for instructions value: -1.52 -1.36
95% mean confidence interval for instructions %-change: -1.09% -1.01%
Instructions are helped.

total cycles in shared programs: 153944316 -> 153942720 (<.01%)
cycles in affected programs: 1640956 -> 1639360 (-0.10%)
helped: 601
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2
helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08%
HURT stats (abs)   min: 2 max: 72 x̄: 36.13 x̃: 36
HURT stats (rel)   min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00%
95% mean confidence interval for cycles value: -3.44 -1.74
95% mean confidence interval for cycles %-change: -0.18% -0.09%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8139924 -> 8139378 (<.01%)
instructions in affected programs: 69776 -> 69230 (-0.78%)
helped: 322
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1
helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54%
95% mean confidence interval for instructions value: -1.88 -1.51
95% mean confidence interval for instructions %-change: -0.85% -0.72%
Instructions are helped.

total cycles in shared programs: 188542864 -> 188541756 (<.01%)
cycles in affected programs: 3031532 -> 3030424 (-0.04%)
helped: 320
HURT: 0
helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2
helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -3.85 -3.07
95% mean confidence interval for cycles %-change: -0.06% -0.05%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick 365b45d571 intel/vec4: Try to emit a single load for multiple 3-src instruction operands
If a 3-source instruction uses immediate values 1.0 and -1.0, just load
1.0 into a register.  Use the negation source modifier to get -1.0.
This has trivial impact now, but it prevents a few thousand regressions
on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1,
a) and flrp(1, -1, a)"

All Gen6 and Gen7 platforms had similar results. (Haswell shown)
total instructions in shared programs: 13487412 -> 13487406 (<.01%)
instructions in affected programs: 541 -> 535 (-1.11%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.33% -0.97%
Instructions are helped.

total cycles in shared programs: 376402564 -> 376402500 (<.01%)
cycles in affected programs: 10348 -> 10284 (-0.62%)
helped: 10
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2
helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29%
95% mean confidence interval for cycles value: -11.72 0.08
95% mean confidence interval for cycles %-change: -1.20% -0.36%
Inconclusive result (value mean confidence interval includes 0).

No shader-db changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick 6f6bc842f6 intel/vec4: Refactor operand fixing for ffma and flrp
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Caio Marcelo de Oliveira Filho b390ff3517 intel/fs: Add support for SLM fence in Gen11
Gen11 SLM is not on L3 anymore, so now the hardware has two separate
fences.  Add a way to control which fence types to use.

At this time, we don't have enough information in NIR to control the
visibility of the memory being fenced, so for now be conservative and
assume that fences will need a stall.  With more information later
we'll be able to reduce those.

Fixes Vulkan CTS tests in ICL:

    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp

The whole set of supported tests in dEQP-VK.memory_model.* group
should be passing in ICL now.

v2: Pass BTI around instead of having an enum.  (Jason)
    Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets
    transformed into two.  (Jason)
    List tests fixed.  (Lionel)

v3: For clarity, split the decision of which fences to emit from the
    emission code.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-11 08:29:32 -07:00
Jason Ekstrand 14781e2122 intel/compiler: Add a "base class" for program keys
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data.  I'd like to add another thing or two and need a
place to put it.  This commit adds a new brw_base_prog_key struct which
contains those two common bits.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-10 19:35:55 +00:00
Ian Romanick dd2dc7e707 intel/vec4: Delete vec4_visitor::emit_lrp
Effectivley unused since dd7135d55d ("intel/compiler: Use the flrp
lowering pass for all stages on Gen4 and Gen5").  I had intended to
remove this code as part of that series, but I forgot.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:11 -07:00
Ian Romanick 47c2aa5b48 intel/vec4: Reswizzle VF immediates too
Previously, an instruction like

mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F]

would get rewritten as

mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F]

The latter does not produce the correct result.  The VF immediate in the
second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F].  This
commit produces the former.

Fixes: 1ee1d8ab46 ("i965/vec4: Reswizzle sources when necessary.")
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Caio Marcelo de Oliveira Filho 45f5db5a84 intel/fs: Implement "demote to helper invocation"
The "demote" intrinsic works like "discard" but don't change the
control flow, allowing derivative operations to work.  This is the
semantics of D3D discard.

The "is_helper_invocation" intrinsic will return true for helper
invocations -- both the ones that started as helpers and the ones that
where demoted.  This is needed to avoid changing the behavior of
gl_HelperInvocation which is an input (so not expected to change
during shader execution).

v2: Emit the discard jump and comment why it is safe.  (Jason)
    Rework the is_helper_invocation() that was stomping f0.1.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 08:57:25 -07:00
Connor Abbott 6b28808b22 intel/nir: Extract add_const_offset_to_base
Pretty much every driver using nir_lower_io_to_temporaries followed by
nir_lower_io is going to want this. In particular, radv and radeonsi in
the next commits.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Jason Ekstrand fa869f45c8 intel/fs: Use nir_lower_interpolation on gen11+
On gen11, the removed the PLN instruction so we have to emit a pile of
MAD to emulate it.  We may as well do that in NIR so we can optimize and
later schedule it.

Shader-db results on Ice Lake:

    total instructions in shared programs: 17145644 -> 16556440 (-3.44%)
    instructions in affected programs: 11507454 -> 10918250 (-5.12%)
    helped: 35763
    HURT: 42085
    helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18
    helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49%
    HURT stats (abs)   min: 1 max: 248 x̄: 2.22 x̃: 2
    HURT stats (rel)   min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47%
    95% mean confidence interval for instructions value: -7.67 -7.47
    95% mean confidence interval for instructions %-change: -4.46% -4.29%
    Instructions are helped.

    total loops in shared programs: 4370 -> 4370 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 360624645 -> 368220857 (2.11%)
    cycles in affected programs: 269631244 -> 277227456 (2.82%)
    helped: 15583
    HURT: 65874
    helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32
    helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44%
    HURT stats (abs)   min: 1 max: 238638 x̄: 133.87 x̃: 20
    HURT stats (rel)   min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97%
    95% mean confidence interval for cycles value: 67.42 119.09
    95% mean confidence interval for cycles %-change: 3.61% 3.73%
    Cycles are HURT.

    total spills in shared programs: 8943 -> 8981 (0.42%)
    spills in affected programs: 1925 -> 1963 (1.97%)
    helped: 44
    HURT: 14

    total fills in shared programs: 21815 -> 21925 (0.50%)
    fills in affected programs: 3511 -> 3621 (3.13%)
    helped: 41
    HURT: 18

    LOST:   70
    GAINED: 14

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand 2b79a9e5a5 intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand 8e7d066682 intel/fs: Actually implement the load_barycentric intrinsics
If they never get used, dead code should clean them up.  Also, we rework
the at_offset and at_sample intrinsics so they return a proper vec2
instead of returning things in PLN layout.  Fortunately, copy-prop is
pretty good at cleaning this up and it doesn't result in any actual
extra MOVs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Sagar Ghuge 1e92e83856 intel/compiler: Emit ROR and ROL instruction
v2: Reorder patch (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge 83fdec0f0d intel/compiler: Enable the emission of ROR/ROL instructions
v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
       align16 mode (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Lionel Landwerlin 5847de6e9a intel/compiler: don't use byte operands for src1 on ICL
The simulator complains about using byte operands, we also have
documentation telling us.

Note that add operations on bytes seems to work fine on HW (like ADD).
Using dwords operands with CMP & SEL fixes the following tests :

   dEQP-VK.spirv_assembly.type.vec*.i8.*

v2: Drop the GLK changes (Matt)
    Add validator tests (Matt)

v3: Drop GLK ref (Matt)
    Don't mix float/integer in MAD (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
BSpec: 3017
Cc: <mesa-stable@lists.freedesktop.org>
2019-06-29 12:56:09 +00:00
Ian Romanick b04beaf41d intel/vec4: Try both sources as candidates for being immediates
For some reason, when I first wrote try_immediate_source, I thought the
sources had already been ordered so that the immediate value was the
second source.  That's rubbish.  The generator assumes *neither* source
is immediate, and it relies on later copy/constant propagation passes to
do the reordering.

For this reason, the changes to try_immediate_source have to go to some
efforts to reorder the operands and tell the caller when it reordered
them.  The generator for comparison instructions uses this to determine
when the comparison needs to change (e.g., from GT to LT).

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

Haswell
total instructions in shared programs: 13484431 -> 13480500 (-0.03%)
instructions in affected programs: 441138 -> 437207 (-0.89%)
helped: 1883
HURT: 0
helped stats (abs) min: 1 max: 49 x̄: 2.09 x̃: 1
helped stats (rel) min: 0.07% max: 8.91% x̄: 1.10% x̃: 0.90%
95% mean confidence interval for instructions value: -2.19 -1.98
95% mean confidence interval for instructions %-change: -1.14% -1.06%
Instructions are helped.

total cycles in shared programs: 376420286 -> 376406400 (<.01%)
cycles in affected programs: 15995668 -> 15981782 (-0.09%)
helped: 1692
HURT: 219
helped stats (abs) min: 2 max: 764 x̄: 13.78 x̃: 4
helped stats (rel) min: <.01% max: 9.69% x̄: 0.69% x̃: 0.35%
HURT stats (abs)   min: 2 max: 516 x̄: 43.09 x̃: 22
HURT stats (rel)   min: 0.02% max: 12.09% x̄: 2.30% x̃: 1.13%
95% mean confidence interval for cycles value: -9.70 -4.83
95% mean confidence interval for cycles %-change: -0.42% -0.28%
Cycles are helped.

total spills in shared programs: 23166 -> 23158 (-0.03%)
spills in affected programs: 66 -> 58 (-12.12%)
helped: 2
HURT: 0

total fills in shared programs: 34592 -> 34580 (-0.03%)
fills in affected programs: 75 -> 63 (-16.00%)
helped: 2
HURT: 0

Ivy Bridge
total instructions in shared programs: 12051590 -> 12048513 (-0.03%)
instructions in affected programs: 355911 -> 352834 (-0.86%)
helped: 1481
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.08 x̃: 1
helped stats (rel) min: 0.07% max: 4.92% x̄: 1.08% x̃: 0.90%
95% mean confidence interval for instructions value: -2.17 -1.98
95% mean confidence interval for instructions %-change: -1.12% -1.04%
Instructions are helped.

total cycles in shared programs: 180319624 -> 180307642 (<.01%)
cycles in affected programs: 15591028 -> 15579046 (-0.08%)
helped: 1340
HURT: 174
helped stats (abs) min: 2 max: 764 x̄: 14.19 x̃: 2
helped stats (rel) min: <.01% max: 8.68% x̄: 0.64% x̃: 0.32%
HURT stats (abs)   min: 2 max: 518 x̄: 40.41 x̃: 14
HURT stats (rel)   min: 0.02% max: 8.37% x̄: 1.59% x̃: 0.67%
95% mean confidence interval for cycles value: -10.85 -4.97
95% mean confidence interval for cycles %-change: -0.45% -0.31%
Cycles are helped.

All Gen6 and earlier platforms had simlar results. (Sandy Bridge shown)
total instructions in shared programs: 10863159 -> 10861462 (-0.02%)
instructions in affected programs: 157839 -> 156142 (-1.08%)
helped: 715
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.37 x̃: 2
helped stats (rel) min: 0.23% max: 4.33% x̄: 1.07% x̃: 0.85%
95% mean confidence interval for instructions value: -2.53 -2.21
95% mean confidence interval for instructions %-change: -1.13% -1.02%
Instructions are helped.

total cycles in shared programs: 153957782 -> 153948778 (<.01%)
cycles in affected programs: 3171648 -> 3162644 (-0.28%)
helped: 696
HURT: 62
helped stats (abs) min: 2 max: 390 x̄: 15.72 x̃: 4
helped stats (rel) min: 0.02% max: 10.57% x̄: 0.57% x̃: 0.12%
HURT stats (abs)   min: 2 max: 300 x̄: 31.29 x̃: 2
HURT stats (rel)   min: 0.11% max: 7.23% x̄: 0.83% x̃: 0.34%
95% mean confidence interval for cycles value: -15.65 -8.11
95% mean confidence interval for cycles %-change: -0.56% -0.36%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 18:13:18 -07:00
Ian Romanick 379cf3bb87 intel/vec4: Try immediate sources for dot products too
No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

All Haswell and earlier platforms has similar results. (Haswell shown)
total instructions in shared programs: 13484467 -> 13484431 (<.01%)
instructions in affected programs: 8540 -> 8504 (-0.42%)
helped: 33
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1
helped stats (rel) min: 0.31% max: 1.53% x̄: 0.49% x̃: 0.35%
95% mean confidence interval for instructions value: -1.19 -0.99
95% mean confidence interval for instructions %-change: -0.60% -0.38%
Instructions are helped.

total cycles in shared programs: 376420572 -> 376420286 (<.01%)
cycles in affected programs: 56260 -> 55974 (-0.51%)
helped: 26
HURT: 5
helped stats (abs) min: 2 max: 204 x̄: 11.85 x̃: 2
helped stats (rel) min: 0.11% max: 3.08% x̄: 0.39% x̃: 0.13%
HURT stats (abs)   min: 2 max: 6 x̄: 4.40 x̃: 6
HURT stats (rel)   min: 0.03% max: 0.35% x̄: 0.24% x̃: 0.35%
95% mean confidence interval for cycles value: -22.91 4.45
95% mean confidence interval for cycles %-change: -0.56% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 17:16:16 -07:00
Ian Romanick eeebeb211f intel/vec4: Try emitting non-scalar immediates
Sometimes an instruction has a vector as a source, but all of the
components have the same value.  For example,

    vec3 32 ssa_16 = load_const (1.0, 1.0, 1.0)
    ...
    vec3 32 ssa_82 = fadd ssa_16, -ssa_81.xyz

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

Haswell
total instructions in shared programs: 13487811 -> 13484467 (-0.02%)
instructions in affected programs: 421981 -> 418637 (-0.79%)
helped: 1859
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.80 x̃: 1
helped stats (rel) min: 0.04% max: 9.80% x̄: 1.04% x̃: 0.84%
95% mean confidence interval for instructions value: -1.85 -1.74
95% mean confidence interval for instructions %-change: -1.07% -1.00%
Instructions are helped.

total cycles in shared programs: 376423252 -> 376420572 (<.01%)
cycles in affected programs: 14800970 -> 14798290 (-0.02%)
helped: 1519
HURT: 329
helped stats (abs) min: 2 max: 462 x̄: 10.59 x̃: 4
helped stats (rel) min: 0.03% max: 16.73% x̄: 0.79% x̃: 0.36%
HURT stats (abs)   min: 2 max: 598 x̄: 40.74 x̃: 16
HURT stats (rel)   min: <.01% max: 10.32% x̄: 2.56% x̃: 0.98%
95% mean confidence interval for cycles value: -3.53 0.63
95% mean confidence interval for cycles %-change: -0.30% -0.09%
Inconclusive result (value mean confidence interval includes 0).

total fills in shared programs: 34601 -> 34592 (-0.03%)
fills in affected programs: 91 -> 82 (-9.89%)
helped: 9
HURT: 0

Ivy Bridge
total instructions in shared programs: 12053565 -> 12051626 (-0.02%)
instructions in affected programs: 298103 -> 296164 (-0.65%)
helped: 1228
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.58 x̃: 1
helped stats (rel) min: 0.04% max: 3.57% x̄: 0.91% x̃: 0.81%
95% mean confidence interval for instructions value: -1.63 -1.53
95% mean confidence interval for instructions %-change: -0.95% -0.88%
Instructions are helped.

total cycles in shared programs: 180322270 -> 180319922 (<.01%)
cycles in affected programs: 14123840 -> 14121492 (-0.02%)
helped: 1036
HURT: 195
helped stats (abs) min: 2 max: 462 x̄: 11.93 x̃: 2
helped stats (rel) min: 0.03% max: 14.05% x̄: 0.82% x̃: 0.35%
HURT stats (abs)   min: 2 max: 598 x̄: 51.33 x̃: 16
HURT stats (rel)   min: <.01% max: 9.68% x̄: 3.02% x̃: 0.72%
95% mean confidence interval for cycles value: -4.92 1.10
95% mean confidence interval for cycles %-change: -0.35% -0.07%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10864286 -> 10863189 (-0.01%)
instructions in affected programs: 159722 -> 158625 (-0.69%)
helped: 724
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.52 x̃: 1
helped stats (rel) min: 0.10% max: 2.91% x̄: 0.79% x̃: 0.62%
95% mean confidence interval for instructions value: -1.58 -1.46
95% mean confidence interval for instructions %-change: -0.82% -0.75%
Instructions are helped.

total cycles in shared programs: 153967938 -> 153957926 (<.01%)
cycles in affected programs: 1923186 -> 1913174 (-0.52%)
helped: 654
HURT: 56
helped stats (abs) min: 2 max: 170 x̄: 20.00 x̃: 4
helped stats (rel) min: 0.03% max: 11.82% x̄: 0.89% x̃: 0.18%
HURT stats (abs)   min: 2 max: 390 x̄: 54.75 x̃: 32
HURT stats (rel)   min: 0.05% max: 6.92% x̄: 3.09% x̃: 2.92%
95% mean confidence interval for cycles value: -17.42 -10.78
95% mean confidence interval for cycles %-change: -0.76% -0.40%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8142677 -> 8141721 (-0.01%)
instructions in affected programs: 139511 -> 138555 (-0.69%)
helped: 588
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.63 x̃: 1
helped stats (rel) min: 0.21% max: 4.39% x̄: 0.84% x̃: 0.46%
95% mean confidence interval for instructions value: -1.70 -1.55
95% mean confidence interval for instructions %-change: -0.89% -0.78%
Instructions are helped.

total cycles in shared programs: 188549394 -> 188547676 (<.01%)
cycles in affected programs: 3171960 -> 3170242 (-0.05%)
helped: 527
HURT: 0
helped stats (abs) min: 2 max: 18 x̄: 3.26 x̃: 2
helped stats (rel) min: <.01% max: 0.80% x̄: 0.08% x̃: 0.06%
95% mean confidence interval for cycles value: -3.49 -3.03
95% mean confidence interval for cycles %-change: -0.09% -0.07%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 17:16:06 -07:00
Lionel Landwerlin 836225840c intel/compiler: fix derivative on y axis implementation
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more
obvious what is going on and also sets the group each of the 4 threads
in the groups are supposed to execute.

Fixes the following CTS tests :

   dEQP-VK.glsl.derivate.dfdyfine.dynamic_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2134ea3800 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")
2019-06-27 18:14:58 +00:00
Tapani Pälli 7a6e5a4bc3 intel/compiler: silence a warning of using different enum type
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-25 10:09:22 +03:00
Nicolai Hähnle de8a919702 u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros
The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.

The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.

With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.

v2:
- ensure operations are no-op when allocation fails
- in util_dynarray_clone, call resize_bytes with a compile-time constant element size

v3:
- fix iris, lima, panfrost

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:30:25 -04:00
Ian Romanick 39f4dc23a5 intel/fs: Mark source 0 of bcsel as needing Boolean resolve
The other sources of the bcsel behave like the sources of an and or
other logical operation.  However, source zero behaves differently.
It is evaluated as a Boolean, so it needs to be resolved.

No shader-db changes, but the tests mentioned in the bug get a couple
instructions added back.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-11 12:12:07 -07:00
Ian Romanick 1c30d26d89 intel/compiler: Treat b32csel as potentially producing a Boolean result for resolve analysis
If the 2nd and 3rd source are both Boolean values, we can potentially
avoid a resolve by only resolving the result of the b32csel.

No changes on any Gen6+ Intel platform.

v2: Use ?: instead of cast from bool to unsigned.  Suggested by Caio.

Iron Lake
total instructions in shared programs: 8142729 -> 8142677 (<.01%)
instructions in affected programs: 12890 -> 12838 (-0.40%)
helped: 26
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.39%
Instructions are helped.

total cycles in shared programs: 188549632 -> 188549394 (<.01%)
cycles in affected programs: 60754 -> 60516 (-0.39%)
helped: 25
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8
helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70%
95% mean confidence interval for cycles value: -12.91 -5.40
95% mean confidence interval for cycles %-change: -0.84% -0.23%
Cycles are helped.

GM45
total instructions in shared programs: 5013119 -> 5013093 (<.01%)
instructions in affected programs: 6764 -> 6738 (-0.38%)
helped: 13
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.34%
Instructions are helped.

total cycles in shared programs: 128977804 -> 128977700 (<.01%)
cycles in affected programs: 37738 -> 37634 (-0.28%)
helped: 13
HURT: 0
helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8
helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26%
95% mean confidence interval for cycles value: -8.00 -8.00
95% mean confidence interval for cycles %-change: -0.36% -0.24%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:04:17 -07:00
Ian Romanick 0ba9497e66 intel/fs: Improve discard_if code generation
Previously we would blindly emit an sequence like:

        mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
        ...
        cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
(+f0.1) cmp.z.f0.1(16)  null<1>D        g7<8,8,1>D      0D

The first move sets the flags based on the initial execution mask.
Later discard sequences contain a predicated compare that can only
remove more SIMD channels.  Often times the only user of the result from
the first compare is the second compare.  Instead, generate a sequence
like

        mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
        ...
        cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
(+f0.1) cmp.ge.f0.1(8)  null<1>F        g5<8,8,1>F      0x41700000F  /* 15F */

If the results stored in g7 and f0.0 are not used, the comparison will
be eliminated.  This removes an instruction and potentially reduces
register pressure.

v2: Major re-write of the commit message (including fixing the assembly
code).  Suggested by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224434 -> 17198659 (-0.15%)
instructions in affected programs: 2908125 -> 2882350 (-0.89%)
helped: 18891
HURT: 5
helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02%
HURT stats (abs)   min: 9 max: 105 x̄: 51.40 x̃: 35
HURT stats (rel)   min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56%
95% mean confidence interval for instructions value: -1.39 -1.34
95% mean confidence interval for instructions %-change: -1.79% -1.73%
Instructions are helped.

total cycles in shared programs: 361468458 -> 361170679 (-0.08%)
cycles in affected programs: 38470116 -> 38172337 (-0.77%)
helped: 16202
HURT: 1456
helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18
helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18%
HURT stats (abs)   min: 1 max: 5982 x̄: 87.51 x̃: 28
HURT stats (rel)   min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64%
95% mean confidence interval for cycles value: -18.24 -15.49
95% mean confidence interval for cycles %-change: -2.26% -2.14%
Cycles are helped.

total spills in shared programs: 12147 -> 12176 (0.24%)
spills in affected programs: 175 -> 204 (16.57%)
helped: 8
HURT: 5

total fills in shared programs: 25262 -> 25292 (0.12%)
fills in affected programs: 269 -> 299 (11.15%)
helped: 8
HURT: 5

Haswell
total instructions in shared programs: 13530316 -> 13502647 (-0.20%)
instructions in affected programs: 2507824 -> 2480155 (-1.10%)
helped: 18859
HURT: 10
helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1
helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41%
HURT stats (abs)   min: 5 max: 39 x̄: 25.70 x̃: 31
HURT stats (rel)   min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31%
95% mean confidence interval for instructions value: -1.49 -1.44
95% mean confidence interval for instructions %-change: -2.42% -2.34%
Instructions are helped.

total cycles in shared programs: 377865412 -> 377639034 (-0.06%)
cycles in affected programs: 40169572 -> 39943194 (-0.56%)
helped: 15550
HURT: 1938
helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18
helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25%
HURT stats (abs)   min: 1 max: 4862 x̄: 89.17 x̃: 35
HURT stats (rel)   min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75%
95% mean confidence interval for cycles value: -14.42 -11.47
95% mean confidence interval for cycles %-change: -2.05% -1.91%
Cycles are helped.

total spills in shared programs: 26769 -> 26814 (0.17%)
spills in affected programs: 826 -> 871 (5.45%)
helped: 9
HURT: 10

total fills in shared programs: 38383 -> 38425 (0.11%)
fills in affected programs: 834 -> 876 (5.04%)
helped: 9
HURT: 10

LOST:   5
GAINED: 10

Ivy Bridge
total instructions in shared programs: 12079250 -> 12044139 (-0.29%)
instructions in affected programs: 2409680 -> 2374569 (-1.46%)
helped: 16135
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2
helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68%
95% mean confidence interval for instructions value: -2.21 -2.14
95% mean confidence interval for instructions %-change: -2.76% -2.67%
Instructions are helped.

total cycles in shared programs: 180116747 -> 179900405 (-0.12%)
cycles in affected programs: 25439823 -> 25223481 (-0.85%)
helped: 13817
HURT: 1499
helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18
helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97%
HURT stats (abs)   min: 1 max: 3684 x̄: 98.99 x̃: 52
HURT stats (rel)   min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42%
95% mean confidence interval for cycles value: -15.68 -12.57
95% mean confidence interval for cycles %-change: -1.77% -1.63%
Cycles are helped.

LOST:   8
GAINED: 10

Sandy Bridge
total instructions in shared programs: 10878990 -> 10863659 (-0.14%)
instructions in affected programs: 1806702 -> 1791371 (-0.85%)
helped: 13023
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1
helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10%
95% mean confidence interval for instructions value: -1.18 -1.17
95% mean confidence interval for instructions %-change: -1.68% -1.62%
Instructions are helped.

total cycles in shared programs: 154082878 -> 153862810 (-0.14%)
cycles in affected programs: 20199374 -> 19979306 (-1.09%)
helped: 12048
HURT: 510
helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18
helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52%
HURT stats (abs)   min: 1 max: 448 x̄: 54.39 x̃: 16
HURT stats (rel)   min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17%
95% mean confidence interval for cycles value: -17.97 -17.08
95% mean confidence interval for cycles %-change: -1.84% -1.75%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 8155075 -> 8142729 (-0.15%)
instructions in affected programs: 949495 -> 937149 (-1.30%)
helped: 5810
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.59% -2.48%
Instructions are helped.

total cycles in shared programs: 188584610 -> 188549632 (-0.02%)
cycles in affected programs: 17274446 -> 17239468 (-0.20%)
helped: 3881
HURT: 90
helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6
helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30%
HURT stats (abs)   min: 2 max: 10 x̄: 2.80 x̃: 2
HURT stats (rel)   min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07%
95% mean confidence interval for cycles value: -9.35 -8.27
95% mean confidence interval for cycles %-change: -0.85% -0.77%
Cycles are helped.

GM45
total instructions in shared programs: 5019308 -> 5013119 (-0.12%)
instructions in affected programs: 489028 -> 482839 (-1.27%)
helped: 2912
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.54% -2.39%
Instructions are helped.

total cycles in shared programs: 129002592 -> 128977804 (-0.02%)
cycles in affected programs: 12669152 -> 12644364 (-0.20%)
helped: 2759
HURT: 37
helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4
helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31%
HURT stats (abs)   min: 2 max: 10 x̄: 3.62 x̃: 4
HURT stats (rel)   min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04%
95% mean confidence interval for cycles value: -9.53 -8.20
95% mean confidence interval for cycles %-change: -0.79% -0.70%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:04:13 -07:00
Ian Romanick a288708506 intel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu
This is the same as the need_dest parameter to
prepare_alu_destination_and_sources.  This allows us to not change the
register that is expected to hold an result if an instruction is
re-emitted.  This is particularly a problem if the re-emitted
instruction is a partial write.  A later patch will use this feature.

No shader-db changes on any Intel platform.

v2: Don't do the Boolean resolve when there is no destination.  If the
ALU instruction didn't write a register, there's nothing to resolve.
This replaces an earlier patch "intel/fs: Allocate dummy destination
register when need_dest is false".

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:04:08 -07:00
Ian Romanick e13a5c7d67 intel/fs: Allow cmod propagation across reads and writes of different flags
This also helps a later patch (intel/fs: Improve discard_if code
generation) on about 200 shaders.

v2: Document that other instruction sequences are also valid in
subtract_merge_with_compare_intervening_mismatch_flag_write.  Suggested
by Caio.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17224434 (<.01%)
instructions in affected programs: 296 -> 292 (-1.35%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.04% -0.81%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361468458 (<.01%)
cycles in affected programs: 2862 -> 2865 (0.10%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31%
HURT stats (abs)   min: 3 max: 4 x̄: 3.50 x̃: 3
HURT stats (rel)   min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -4.34 5.84
95% mean confidence interval for cycles %-change: -0.70% 0.90%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:03:45 -07:00
Ian Romanick 8030cb75c1 intel/fs: Fix flag_subreg handling in cmod propagation
There were two errors.  First, the pass could propagate conditional
modifiers from an instruction that writes on flag register to an
instruction that writes a different flag register.  For example,

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
    cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F

could be come

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F

Second, if an instruction writes f0.1 has it's condition propagated, the
modified instruction will incorrectly write flag f0.0.  For example,

    linterp(16) vgrf6:F, g2:F, attr0:F
    cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F
    (-f0.1) discard_jump(16) (null):UD

could become

    linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F
    (-f0.1) discard_jump(16) (null):UD

None of these cases will occur currently.  The only time we use f0.1 is
for generating discard intrinsics.  In all those cases, we generate a
squence like:

    cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F
    (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d
    (-f0.1) discard_jump(16) (null):UD

Due to the mixed types and incompatible conditions, this sequence would
never see any cmod propagation.  The next patch will change this.

No shader-db changes on any Intel platform.

v2: Fix typo in comment in test case subtract_delete_compare_other_flag.
Noticed by Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:03:40 -07:00
Ian Romanick 2dd6013933 intel/fs: Add missing tests for cmod_propagate_not
Tests like this should have been added in 4467040cb6 ("i965/fs:
Propagate conditional modifiers from not instructions").

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:03:31 -07:00
Kenneth Graunke c7d1b52a2c nir: Combine lower_fmod16/32 back into a single lower_fmod.
We originally had a single lower_fmod option.  In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there.  This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.

Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again.  I'm not aware of any hardware which
need lowering for one bitsize and not the other.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke edd45af9ba nir: Drop lower_fmod64 option.
nir_lower_doubles offers a wide variety of fp64 lowering, including
lowering fmod@64.  The version there also better handles imprecisions
due to lowered frcp@64.  Let's consolidate on one version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Jason Ekstrand 811c05dfe6 intel/nir: Take nir_shader*s in brw_nir_link_shaders
Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_* is
enabled, we can just take a single pointer and not a pointer to pointer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:07:28 +00:00
Jason Ekstrand bb67a99a2d intel/nir: Stop returning the shader from helpers
Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:07:28 +00:00
Jason Ekstrand f4ef34f207 intel/fs: Add an UNDEF instruction to avoid excess live ranges
With 8 and 16-bit types and anything where we have to use non-trivial
strides registersto deal with restrictions, we end up with things that
look like partial writes even though we don't care about any values in
the register except those written by that instruction.  This is
particularly important when dealing with loops because liveness sees
is_partial_write and the fact that an old version from a previous loop
iteration may be valid at that point and extends all purely partially
written values to the entire loop.

This commit adds a new UNDEF instruction which does nothing (the
generator doesn't emit anything) but which does a fake write to the
register.  This informs liveness that we don't care about any values
before that point so it won't consider those registers to be falsely
live.  We can safely emit UNDEF instructions for all SSA values that
come in from NIR and nearly all temporaries generated by various stages
of the compiler.  In particular, we need to insert UNDEF instructions
when we handle region restrictions because the newly allocated registers
are almost guaranteed to be partially written.

No shader-db changes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-04 14:27:30 -05:00
Jason Ekstrand a84de3fb7c intel/fs: Skip registers faster when setting spill costs
This might be slightly faster since we're doing one read rather than
two before we decide to skip.  The more important reason, however, is
because no_spill prevents us from re-spilling spill registers.  In the
new world in which we don't re-calculate liveness every spill, we may
not have valid liveness for spill registers so we shouldn't even look
their live ranges up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110825
Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-04 14:37:56 +00:00
Sagar Ghuge 3016756398 intel/compiler: Fix assertions in brw_alu3
v2: Fix assertion for src1 (Ian Romanick)

Fixes: 3b967e17 (intel/compiler: Avoid false positive assertions)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-03 23:14:34 -07:00
Ian Romanick 65df6122da intel/compiler: Use compare rematerialization pass
Almost all of the spill / fill benefit is in Deus Ex.

Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17196395 (-0.16%)
instructions in affected programs: 1518658 -> 1490615 (-1.85%)
helped: 1550
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2
helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45%
HURT stats (abs)   min: 5 max: 10 x̄: 6.67 x̃: 5
HURT stats (rel)   min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32%
95% mean confidence interval for instructions value: -19.86 -16.26
95% mean confidence interval for instructions %-change: -1.19% -1.04%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361288721 (-0.05%)
cycles in affected programs: 197367688 -> 197187954 (-0.09%)
helped: 990
HURT: 683
helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16
helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26%
HURT stats (abs)   min: 1 max: 12190 x̄: 905.14 x̃: 22
HURT stats (rel)   min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47%
95% mean confidence interval for cycles value: -315.45 100.58
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12147 -> 8948 (-26.34%)
spills in affected programs: 5433 -> 2234 (-58.88%)
helped: 343
HURT: 0

total fills in shared programs: 25262 -> 21814 (-13.65%)
fills in affected programs: 7771 -> 4323 (-44.37%)
helped: 343
HURT: 3

LOST:   0
GAINED: 17

Ivy Bridge
total instructions in shared programs: 12083517 -> 12081427 (-0.02%)
instructions in affected programs: 540744 -> 538654 (-0.39%)
helped: 786
HURT: 29
helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31%
95% mean confidence interval for instructions value: -2.83 -2.30
95% mean confidence interval for instructions %-change: -0.57% -0.47%
Instructions are helped.

total cycles in shared programs: 180153463 -> 180124798 (-0.02%)
cycles in affected programs: 72597920 -> 72569255 (-0.04%)
helped: 572
HURT: 249
helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13
helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26%
HURT stats (abs)   min: 1 max: 11060 x̄: 136.37 x̃: 10
HURT stats (rel)   min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32%
95% mean confidence interval for cycles value: -96.22 26.39
95% mean confidence interval for cycles %-change: -0.43% -0.23%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3625 -> 3623 (-0.06%)
spills in affected programs: 46 -> 44 (-4.35%)
helped: 1
HURT: 0

total fills in shared programs: 4065 -> 4061 (-0.10%)
fills in affected programs: 104 -> 100 (-3.85%)
helped: 1
HURT: 0

LOST:   0
GAINED: 8

Sandy Bridge
total instructions in shared programs: 10879656 -> 10878699 (<.01%)
instructions in affected programs: 275167 -> 274210 (-0.35%)
helped: 544
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25%
95% mean confidence interval for instructions value: -1.97 -1.55
95% mean confidence interval for instructions %-change: -0.43% -0.36%
Instructions are helped.

total cycles in shared programs: 154089096 -> 154081132 (<.01%)
cycles in affected programs: 4422722 -> 4414758 (-0.18%)
helped: 459
HURT: 214
helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8
helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14%
HURT stats (abs)   min: 1 max: 226 x̄: 19.99 x̃: 4
HURT stats (rel)   min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09%
95% mean confidence interval for cycles value: -15.51 -8.15
95% mean confidence interval for cycles %-change: -0.31% -0.17%
Cycles are helped.

total spills in shared programs: 2880 -> 2876 (-0.14%)
spills in affected programs: 636 -> 632 (-0.63%)
helped: 2
HURT: 0

total fills in shared programs: 3161 -> 3157 (-0.13%)
fills in affected programs: 1519 -> 1515 (-0.26%)
helped: 2
HURT: 0

LOST:   0
GAINED: 2

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8157361 -> 8155067 (-0.03%)
instructions in affected programs: 382491 -> 380197 (-0.60%)
helped: 677
HURT: 0
helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2
helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42%
95% mean confidence interval for instructions value: -3.76 -3.01
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 188588292 -> 188583040 (<.01%)
cycles in affected programs: 3155064 -> 3149812 (-0.17%)
helped: 377
HURT: 13
helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6
helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12%
HURT stats (abs)   min: 2 max: 8 x̄: 5.85 x̃: 6
HURT stats (rel)   min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04%
95% mean confidence interval for cycles value: -15.67 -11.27
95% mean confidence interval for cycles %-change: -0.45% -0.30%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Jason Ekstrand 9e403dc56e intel/fs: Do a stalling MFENCE in endInvocationInterlock()
Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Jason Ekstrand 859de4a748 intel/fs,vec4: Use g0 as the header for MFENCE
We set header_present but then pass it some random garbage.  Give it g0
instead.  I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Kenneth Graunke 6a9e39d44b iris: Ask st to vectorize our IO.
(Technically this is common code, but it doesn't affect i965 or anv.)

Improves performance of GFXBench5/gl_tess_off on Skylake GT4e at 1080p
by 9.3933% +/- 0.0305157% by eliminating all spilling in the GS.

Improves performance of GFXBench5/gl_4_off (Car Chase) on Skylake GT4e
at 1080p by 0.325208% +/- 0.0842233% (n=18).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-28 01:06:48 -07:00
Jason Ekstrand f2dc0f2872 nir: Drop imov/fmov in favor of one mov instruction
The difference between imov and fmov has been a constant source of
confusion in NIR for years.  No one really knows why we have two or when
to use one vs. the other.  The real reason is that they do different
things in the presence of source and destination modifiers.  However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-05-24 08:38:11 -05:00
Jason Ekstrand 8ffbb54405 intel: Implement abs, neg, and sat in the back-end
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand 4fde459563 intel/nir: Call alu_to_scalar one last time before out-of-ssa
A few of our very late passes can end up generating vectors accidentally
so we need to get rid of them.  The only known case of this is the ffma
peephole which generates fneg and fabs as vectors.  Currently, they're
not a problem because they get turned into fmov which the back-end
compiler knows how to handle as a vector.  That's about to change.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00