Commit Graph

1510 Commits

Author SHA1 Message Date
Kristian H. Kristensen 41593f3c37 nir_opcodes.py: Saturate to expression that doesn't overflow
Compiler warns about overflow when assigning UINT64_MAX to something
smaller than a uin64_t:

src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion]
            uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1);
                    ~~~                          ^~~~~~~~~~

Shift UINT64_MAX down to the appropriate maximum value for the type
being assigned to.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-19 16:17:37 +00:00
Eric Anholt 38c75aff4c nir: Use the nir_builder _imm helpers in setting up deref offsets.
When looking at the dEQP nested_struct_array_dynamic_index_fragment code
after lowering, I was horrified at the amount of adding and multiplying by
0 we were doing.  The builder _imm helpers handle that for you so that the
following optimization passes have less work to do.  Plus, it's easier to
read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-19 08:45:14 -07:00
Eric Anholt 9ac5ec2f90 nir: Fix deref offset calculation for structs.
We were calcuating the offset for the field within the struct, and just
dropping it on the floor.  Fixes a regression in
KHR-GLES3.shaders.struct.local.nested_struct_array_dynamic_index_fragment
and a few of its friends since the scratch lowering commit.

Fixes: e8e159e9df ("nir/deref: Add helpers for getting offsets")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-19 08:45:14 -07:00
Erico Nunes 4577eb7b7c nir/algebraic: add lowering for fsign
The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-19 15:42:23 +00:00
Ian Romanick 6b97fa9a99 nir/algebraic: Strength reduce some compares of x and -x
Converting the x vs -x comparison to an x vs 0 comparison enable cmod
propagation to help.

The seems to be a win everywhere except Gen7.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15566733 -> 15566014 (<.01%)
instructions in affected programs: 72617 -> 71898 (-0.99%)
helped: 302
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98%
95% mean confidence interval for instructions value: -2.55 -2.21
95% mean confidence interval for instructions %-change: -1.40% -1.16%
Instructions are helped.

total cycles in shared programs: 413014786 -> 413015475 (<.01%)
cycles in affected programs: 707594 -> 708283 (0.10%)
helped: 227
HURT: 101
helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20
helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49%
HURT stats (abs)   min: 2 max: 334 x̄: 87.90 x̃: 45
HURT stats (rel)   min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36%
95% mean confidence interval for cycles value: -8.12 12.32
95% mean confidence interval for cycles %-change: -0.67% 0.34%
Inconclusive result (value mean confidence interval includes 0).

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13828220 -> 13827881 (<.01%)
instructions in affected programs: 60887 -> 60548 (-0.56%)
helped: 253
HURT: 6
helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1
helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27%
95% mean confidence interval for instructions value: -1.39 -1.23
95% mean confidence interval for instructions %-change: -0.85% -0.70%
Instructions are helped.

total cycles in shared programs: 386870095 -> 386894412 (<.01%)
cycles in affected programs: 1537307 -> 1561624 (1.58%)
helped: 127
HURT: 188
helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4
helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33%
HURT stats (abs)   min: 2 max: 5585 x̄: 141.43 x̃: 14
HURT stats (rel)   min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06%
95% mean confidence interval for cycles value: 21.95 132.45
95% mean confidence interval for cycles %-change: 0.32% 0.85%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10896339 -> 10896276 (<.01%)
instructions in affected programs: 10757 -> 10694 (-0.59%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89%
95% mean confidence interval for instructions value: -1.42 -1.15
95% mean confidence interval for instructions %-change: -1.03% -0.72%
Instructions are helped.

total cycles in shared programs: 155091003 -> 155090480 (<.01%)
cycles in affected programs: 102761 -> 102238 (-0.51%)
helped: 51
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4
helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36%
95% mean confidence interval for cycles value: -12.98 -7.53
95% mean confidence interval for cycles %-change: -0.97% -0.56%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8234667 -> 8234652 (<.01%)
instructions in affected programs: 2063 -> 2048 (-0.73%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.97% -0.67%
Instructions are helped.

total cycles in shared programs: 188700906 -> 188700598 (<.01%)
cycles in affected programs: 283480 -> 283172 (-0.11%)
helped: 83
HURT: 3
helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4
helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04%
95% mean confidence interval for cycles value: -3.87 -3.29
95% mean confidence interval for cycles %-change: -0.16% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick f3d6df719c nir/algebraic: Fix some 1-bit Boolean weirdness
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total cycles in shared programs: 372594532 -> 372594460 (<.01%)
cycles in affected programs: 46854 -> 46782 (-0.15%)
helped: 9
HURT: 0
helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09%
95% mean confidence interval for cycles value: -14.34 -1.66
95% mean confidence interval for cycles %-change: -0.28% -0.04%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 12038379 -> 12038373 (<.01%)
instructions in affected programs: 1278 -> 1272 (-0.47%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55%

total cycles in shared programs: 180889027 -> 180888997 (<.01%)
cycles in affected programs: 29979 -> 29949 (-0.10%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5
helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07%
95% mean confidence interval for cycles value: -13.40 1.40
95% mean confidence interval for cycles %-change: -0.27% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total cycles in shared programs: 155091021 -> 155091003 (<.01%)
cycles in affected programs: 8842 -> 8824 (-0.20%)
helped: 2
HURT: 0

No changes on Iron Lake or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick 403aac7500 nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel
All of the affected shaders are in Mad Max.  I noticed this while
looking at some other things.  I tried a couple similar patterns, but
the affect on cycles was general negative.  It may be worth revisiting
this later.

v2: Rebase on 1-bit Boolean changes.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.

total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick 25bfba3335 nir/algebraic: Recognize open-coded copysign(1.0, a)
All of the affected shaders are in Mad Max.  The inner part of the
pattern is itself an open-coded sign(a).  I tried using that as a
pattern, but the results were not good.  A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.

v2: Rebase on 1-bit Boolean changes.

v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.

total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Jason Ekstrand c6463f8ac2 nir: Add a nir_src_as_intrinsic() helper
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-18 17:12:44 +00:00
Jason Ekstrand 85c35885b3 nir: Rework nir_src_as_alu_instr to not take a pointer
Other nir_src_as_* functions just take a nir_src.  It's not that much
more memory copying and the constness preserving really isn't worth the
cognitive dissonance.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-18 17:12:44 +00:00
Jason Ekstrand eee994e769 nir: Drop "struct" from some nir_* declarations
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-18 17:12:44 +00:00
Jason Ekstrand ba0f203ae8 nir/algebraic: Use a cache to avoid re-emitting structs
This takes the stupid simplest and most reliable approach to reducing
redundancy that I could come up with:  Just use the struct declaration
as the cach key.  This cuts the size of the generated C file to about
half and takes about 50 KiB off the .data section.

size before (release build):

   text	   data	    bss	    dec	    hex	filename
5363833	 336880	  13584	5714297	 573179	_install/lib64/libvulkan_intel.so

size after (release build):

   text	   data	    bss	    dec	    hex	filename
5229017	 285264	  13584	5527865	 545939	_install/lib64/libvulkan_intel.so

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-04-16 16:40:15 +00:00
Jason Ekstrand 0c712fd404 nir/algebraic: Move the template closer to the render function
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-04-16 16:40:15 +00:00
Marek Olšák d3ce8a7f6b nir: optimize gl_SampleMaskIn to gl_HelperInvocation for radeonsi when possible
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-16 10:24:19 -04:00
Rhys Perry 8671cfe2a2 nir,ac/nir: fix cube_face_coord
Seems it was missing the "/ ma + 0.5" and the order was swapped.

Fixes: a1a2a8dfda ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-15 17:22:47 +01:00
Timothy Arceri 8f74a60c43 nir: fix packing components with arrays
When gathering info for unmovable types we need to handle arrays.
While we dont support packing/moving arrays we do support packing
scalar components with these arrays.

Fixes piglit:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test

Fixes: 5eb17506e1 ("nir: do not pack varying with different types")

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-15 19:25:12 +10:00
Jason Ekstrand 47709ca146 nir/validate: Require unused bits of nir_const_value to be zero
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Jason Ekstrand c4b28d1730 nir/load_const_to_scalar: Get rid of a bit size switch statement
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to pluck off components properly.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Jason Ekstrand b8197a01a9 nir/constant_folding: Get rid of a bit size switch statement
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to swizzle them properly.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Karol Herbst 14531d676b nir: make nir_const_value scalar
v2: remove & operator in a couple of memsets
    add some memsets
v3: fixup lima

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2019-04-14 22:25:56 +02:00
Karol Herbst e72beacb95 nir/loop_analyze: use nir_const_value.b for boolean results, not u32
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Jason Ekstrand 10602db78c nir/print: Use nir_src_as_int for array indices
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Jason Ekstrand 9b1e4bab6b nir/builder: Add a nir_imm_zero helper
v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Karol Herbst daaf777376 nir/builder: Move nir_imm_vec2 from blorp into the builder
While we're here, fix a typo which caused it to actually return a vec4
with the third and fourth components zero.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Alyssa Rosenzweig 2ce4adefa5 nir: Add nir_lower_viewport_transform
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.

v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!

v3: Bikeshed continues.

v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment.

Ian and Qiang's reviews are from v3, but no real functional changes from
v4. Rob's review is from v4.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-04-14 19:15:13 +00:00
Christian Gmeiner b6bed115a5 nir: add lower_ftrunc
Port TGSI TRUNC lowering to nir

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-13 17:54:48 +00:00
Jason Ekstrand 18ed82b084 nir: Add a pass for selectively lowering variables to scratch space
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space.  It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.

v2: Drop const_index comments (by anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-12 15:59:31 -07:00
Eric Anholt b88ef3bd76 nir: Add a comment about how intrinsic definitions work.
I was thinking about a refactor, and needed to read this first.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:56:12 -07:00
Eric Anholt 35355b4860 nir: Drop remaining references to const_index in favor of the call to use.
Please don't make me read a const_index[] expression ever again.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:56:04 -07:00
Eric Anholt 6e4d3d0a2f nir: Drop comments about the constant_index slots for load/stores.
The constant_index slots are named right there in the intrinsic
definition, and the comment is just a chance to get out of sync.  Noticed
while reviewing the lower_to_scratch changes that copy-and-pasted wrong
comments, and load_ubo and load_per_vertex_output had incorrect comments
currently.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:55:55 -07:00
Karol Herbst 4a3c04a11f glsl/nir: add support for lowering bindless images_derefs
v2: handle atomics as well
    make use of nir_rewrite_image_intrinsic
v3: remove call to nir_remove_dead_derefs
v4: (Timothy Arceri) dont actually call lowering yet

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Timothy Arceri 035759b61b nir/i965/freedreno/vc4: add a bindless bool to type size functions
This required to calculate sizes correctly when we have bindless
samplers/images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Karol Herbst 3b2a9ffd60 nir: move brw_nir_rewrite_image_intrinsic into common code
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Timothy Arceri 9e3740c47f nir: initialise some variables in opt_if_loop_last_continue()
Fixes a couple of Coverity warnings CID 1444626.

Fixes: e30804c602 ("nir/radv: remove restrictions on opt_if_loop_last_continue()")

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-04-11 20:38:03 +10:00
Juan A. Suarez Romero 83f1b0e95b nir/xfb: do not use bare interface type
In commit 3b3653c4cf we decided not to use bare types; hence do not use
bare type when comparing with interface type to find out if the xfb
variable is an array block.

This fixes dEQP-VK.transform_feedback.* tests.

Fixes: 3b3653c4cf ("nir/spirv: don't use bare types, remove assert in
                     split vars for testing")
CC: Dave Airlie <airlied@redhat.com>
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-11 11:52:45 +02:00
Bas Nieuwenhuizen 282bacab4a nir: Add access qualifiers on load_ubo intrinsic.
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.

Fixes: 8ed583fe52 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f2 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-10 02:04:04 +02:00
Jason Ekstrand 6279074de1 nir: Get rid of global registers
We have a pass to lower global registers to locals and many drivers
dutifully call it.  However, no one ever creates a global register ever
so it's all dead code.  It's time we bury it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Jason Ekstrand b28bad89b9 nir: Get rid of nir_register::is_packed
All we ever do is initialize it to zero, clone it, print it, and
validate it.  No one ever sets or uses it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Caio Marcelo de Oliveira Filho fcbc5ccaae nir: Don't set LOD=0 for compute shader that has derivative group
When using NV_compute_shader_derivatives to set a derivative group,
a compute shader supports texture with implicit LOD calculation, so
don't set an explicit LOD.

Note if the extension is used but the derivative group is not
specified, it will default to LOD=0 as before.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho d08a74d2bf nir/algebraic: Lower CS derivatives to zero when no group defined
In compute shaders if no derivative group is defined, the derivatives
will always be zero.  Specified in NV_compute_shader_derivatives.

To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules.  (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:32 -07:00
Timothy Arceri e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Jason Ekstrand 50f3535d1f nir/search: Search for all combinations of commutative ops
Consider the following search expression and NIR sequence:

    ('iadd', ('imul', a, b), b)

    ssa_2 = imul ssa_0, ssa_1
    ssa_3 = iadd ssa_2, ssa_0

The current algorithm is greedy and, the moment the imul finds a match,
it commits those variable names and returns success.  In the above
example, it maps a -> ssa_0 and b -> ssa_1.  When we then try to match
the iadd, it sees that ssa_0 is not b and fails to match.  The iadd
match will attempt to flip itself and try again (which won't work) but
it cannot ask the imul to try a flipped match.

This commit instead counts the number of commutative ops in each
expression and assigns an index to each.  It then does a loop and loops
over the full combinatorial matrix of commutative operations.  In order
to keep things sane, we limit it to at most 4 commutative operations (16
combinations).  There is only one optimization in opt_algebraic that
goes over this limit and it's the bitfieldReverse detection for some UE4
demo.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15310125 -> 15302469 (-0.05%)
    instructions in affected programs: 1797123 -> 1789467 (-0.43%)
    helped: 6751
    HURT: 2264

    total cycles in shared programs: 357346617 -> 357202526 (-0.04%)
    cycles in affected programs: 15931005 -> 15786914 (-0.90%)
    helped: 6024
    HURT: 3436

    total loops in shared programs: 4360 -> 4360 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 23675 -> 23666 (-0.04%)
    spills in affected programs: 235 -> 226 (-3.83%)
    helped: 5
    HURT: 1

    total fills in shared programs: 32040 -> 32032 (-0.02%)
    fills in affected programs: 190 -> 182 (-4.21%)
    helped: 6
    HURT: 2

    LOST:   18
    GAINED: 5

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-04-08 21:38:48 +00:00
Jason Ekstrand ad8c145658 nir/algebraic: Add some logical OR and AND patterns
The new OR pattern has been seen in the wild and can end up being
generated by GLSLang.  Not sure about the other two new patterns but we
may as well throw them in for completeness.  While we're here, we can
drop the '@bool' specifier from the one pattern because specifying True
already implies 1-bit which basically implies boolean.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15321227 -> 15321129 (<.01%)
    instructions in affected programs: 3594 -> 3496 (-2.73%)
    helped: 6
    HURT: 0

    total cycles in shared programs: 357481321 -> 357479725 (<.01%)
    cycles in affected programs: 44109 -> 42513 (-3.62%)
    helped: 6
    HURT: 0

VkPipeline-DB results on Kaby Lake:

    total instructions in shared programs: 3770504 -> 3769734 (-0.02%)
    instructions in affected programs: 19058 -> 18288 (-4.04%)
    helped: 163
    HURT: 0

    total cycles in shared programs: 1417583701 -> 1417569727 (<.01%)
    cycles in affected programs: 750958 -> 736984 (-1.86%)
    helped: 158
    HURT: 1

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:06 -05:00
Jason Ekstrand 03a72d96d8 nir/algebraic: Drop some @bool specifiers
Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time.  We can drop these specifiers and now the optimizations
will apply more generally.

Shader-DB results on Kaby Lake:

    total instructions in shared programs: 15321168 -> 15321227 (<.01%)
    instructions in affected programs: 8836 -> 8895 (0.67%)
    helped: 1
    HURT: 31

    total cycles in shared programs: 357481781 -> 357481321 (<.01%)
    cycles in affected programs: 146524 -> 146064 (-0.31%)
    helped: 22
    HURT: 10

    total spills in shared programs: 23675 -> 23673 (<.01%)
    spills in affected programs: 11 -> 9 (-18.18%)
    helped: 1
    HURT: 0

    total fills in shared programs: 32040 -> 32036 (-0.01%)
    fills in affected programs: 27 -> 23 (-14.81%)
    helped: 1
    HURT: 0

No change in VkPipeline-DB

Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb.  In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL.  Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win?  Also it helps spilling in one Car Chase compute shader.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:00 -05:00
Caio Marcelo de Oliveira Filho c037dbb0ef nir: Take if_uses into account when repairing SSA
If a def is used as an condition before its definition, we should also
consider this a case to repair.  When repairing, make sure we rewrite
any if conditions too.

Found in while inspecting a SPIR-V conversion from a 'continue block'
that contains a conditional branch.  We pull the continue block up to
the beggining of the loop, and the condition in the branch ends up
defined afterwards.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 364212f1ed "nir: Add a pass to repair SSA form"
2019-04-05 09:43:46 -07:00
Samuel Pitoiset 5eb17506e1 nir: do not pack varying with different types
The current algorithm only supports packing 32-bit types.
If a shader uses both 16-bit and 32-bit varyings, we shouldn't
compact them together.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 13:57:42 +02:00
Alyssa Rosenzweig a83862754e nir: Add "viewport vector" system values
While a partial set of viewport system values exist, these are scalar
values, which is a poor fit for viewport transformations on vector ISAs
like Midgard (where the vec3 values for scale and offset each need to be
coherent in a vec4 uniform slot to take advantage of vectorized
transform math). This patch adds vec3 scale/offset fields corresponding
to the 3D Gallium viewport / glViewport+depth

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-04 03:44:09 +00:00
Dave Airlie eb8fefe090 nir: use proper array sizing define for vectors
If we increase the vector size in the future it would be good
to not have to fix these up, this should change nothing at present.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-03 13:59:06 +10:00
Timothy Arceri d8ce915a61 Revert "nir: propagate known constant values into the if-then branch"
This reverts commit 4218b6422c.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110311
2019-04-03 13:24:18 +11:00
Timothy Arceri 4218b6422c nir: propagate known constant values into the if-then branch
Helps Max Waves / VGPR use in a bunch of Unigine Heaven
shaders.

shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 5505440 -> 5505872 (0.01 %)
VGPRS: 3077520 -> 3077296 (-0.01 %)
Spilled SGPRs: 39032 -> 39030 (-0.01 %)
Spilled VGPRs: 16326 -> 16326 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 744 -> 744 (0.00 %) dwords per thread
Code Size: 123755028 -> 123753316 (-0.00 %) bytes
Compile Time: 2751028 -> 2560786 (-6.92 %) milliseconds
LDS: 1415 -> 1415 (0.00 %) blocks
Max Waves: 972192 -> 972240 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

vkpipeline-db results RADV (VEGA):

Totals from affected shaders:
SGPRS: 160 -> 160 (0.00 %)
VGPRS: 88 -> 88 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 18268 -> 18152 (-0.63 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 26 -> 26 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-03 10:04:48 +11:00