Commit Graph

3797 Commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho 7b66d584a3 spirv: Rename vtn_decoration literals to operands
Decorations (and ExecutionModes) can have not only literals, but also
Ids associated with them.  So rename the field to the more general
name "Operand" used by the spec.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-23 14:58:01 -07:00
Kenneth Graunke 47303b466c Revert "glsl: Set location on structure-split sampler uniform variables"
This reverts commit 9e0c744f07, which
regressed dEQP-GLES2.functional.uniform_api.random.3.  It turns out
that the newly produced location is meaningless and impossible to
consume by drivers that want to look at gl_uniform_storage, so it's
probably better to leave it unset (0) than a number that looks usable.

Leave a tombstone^Wcomment to discourage the next person from making
the obvious looking fix.

See the next commit for a longer description of the problem.

This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct
on i965, which was originally fixed by the revert.  The next commit
will fix it again.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-22 15:39:55 -07:00
Jason Ekstrand ccb25aaeaf nir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref
We have a macro for this now; no reason to hand-roll it for derefs.
While we're here, move the NIR_DEFINE_CAST for derefs down to where all
the other ones are.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-22 15:23:24 +00:00
Jason Ekstrand 470422870a nir: Add helpers for getting the type of an address format
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand 2edf29b933 intel,nir: Lower TXD with a bindless sampler
When we have a bindless sampler, we need an instruction header.  Even in
SIMD8, this pushes the instruction over the sampler message size maximum
of 11 registers.  Instead, we have to lower TXD to TXL.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand 995dc4e5c3 nir/lower_io: Expose some explicit I/O lowering helpers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Kristian H. Kristensen 41593f3c37 nir_opcodes.py: Saturate to expression that doesn't overflow
Compiler warns about overflow when assigning UINT64_MAX to something
smaller than a uin64_t:

src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion]
            uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1);
                    ~~~                          ^~~~~~~~~~

Shift UINT64_MAX down to the appropriate maximum value for the type
being assigned to.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-19 16:17:37 +00:00
Kristian H. Kristensen 15605cc9d4 glsl_to_nir: Initialize debug variable
If we want to assert on found == true when the loop exits early, we
need to initialize it to false.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-19 16:17:37 +00:00
Eric Anholt 38c75aff4c nir: Use the nir_builder _imm helpers in setting up deref offsets.
When looking at the dEQP nested_struct_array_dynamic_index_fragment code
after lowering, I was horrified at the amount of adding and multiplying by
0 we were doing.  The builder _imm helpers handle that for you so that the
following optimization passes have less work to do.  Plus, it's easier to
read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-19 08:45:14 -07:00
Eric Anholt 9ac5ec2f90 nir: Fix deref offset calculation for structs.
We were calcuating the offset for the field within the struct, and just
dropping it on the floor.  Fixes a regression in
KHR-GLES3.shaders.struct.local.nested_struct_array_dynamic_index_fragment
and a few of its friends since the scratch lowering commit.

Fixes: e8e159e9df ("nir/deref: Add helpers for getting offsets")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-19 08:45:14 -07:00
Erico Nunes 4577eb7b7c nir/algebraic: add lowering for fsign
The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-19 15:42:23 +00:00
Ian Romanick 6b97fa9a99 nir/algebraic: Strength reduce some compares of x and -x
Converting the x vs -x comparison to an x vs 0 comparison enable cmod
propagation to help.

The seems to be a win everywhere except Gen7.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15566733 -> 15566014 (<.01%)
instructions in affected programs: 72617 -> 71898 (-0.99%)
helped: 302
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98%
95% mean confidence interval for instructions value: -2.55 -2.21
95% mean confidence interval for instructions %-change: -1.40% -1.16%
Instructions are helped.

total cycles in shared programs: 413014786 -> 413015475 (<.01%)
cycles in affected programs: 707594 -> 708283 (0.10%)
helped: 227
HURT: 101
helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20
helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49%
HURT stats (abs)   min: 2 max: 334 x̄: 87.90 x̃: 45
HURT stats (rel)   min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36%
95% mean confidence interval for cycles value: -8.12 12.32
95% mean confidence interval for cycles %-change: -0.67% 0.34%
Inconclusive result (value mean confidence interval includes 0).

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13828220 -> 13827881 (<.01%)
instructions in affected programs: 60887 -> 60548 (-0.56%)
helped: 253
HURT: 6
helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1
helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27%
95% mean confidence interval for instructions value: -1.39 -1.23
95% mean confidence interval for instructions %-change: -0.85% -0.70%
Instructions are helped.

total cycles in shared programs: 386870095 -> 386894412 (<.01%)
cycles in affected programs: 1537307 -> 1561624 (1.58%)
helped: 127
HURT: 188
helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4
helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33%
HURT stats (abs)   min: 2 max: 5585 x̄: 141.43 x̃: 14
HURT stats (rel)   min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06%
95% mean confidence interval for cycles value: 21.95 132.45
95% mean confidence interval for cycles %-change: 0.32% 0.85%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10896339 -> 10896276 (<.01%)
instructions in affected programs: 10757 -> 10694 (-0.59%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89%
95% mean confidence interval for instructions value: -1.42 -1.15
95% mean confidence interval for instructions %-change: -1.03% -0.72%
Instructions are helped.

total cycles in shared programs: 155091003 -> 155090480 (<.01%)
cycles in affected programs: 102761 -> 102238 (-0.51%)
helped: 51
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4
helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36%
95% mean confidence interval for cycles value: -12.98 -7.53
95% mean confidence interval for cycles %-change: -0.97% -0.56%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8234667 -> 8234652 (<.01%)
instructions in affected programs: 2063 -> 2048 (-0.73%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.97% -0.67%
Instructions are helped.

total cycles in shared programs: 188700906 -> 188700598 (<.01%)
cycles in affected programs: 283480 -> 283172 (-0.11%)
helped: 83
HURT: 3
helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4
helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04%
95% mean confidence interval for cycles value: -3.87 -3.29
95% mean confidence interval for cycles %-change: -0.16% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick f3d6df719c nir/algebraic: Fix some 1-bit Boolean weirdness
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total cycles in shared programs: 372594532 -> 372594460 (<.01%)
cycles in affected programs: 46854 -> 46782 (-0.15%)
helped: 9
HURT: 0
helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09%
95% mean confidence interval for cycles value: -14.34 -1.66
95% mean confidence interval for cycles %-change: -0.28% -0.04%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 12038379 -> 12038373 (<.01%)
instructions in affected programs: 1278 -> 1272 (-0.47%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55%

total cycles in shared programs: 180889027 -> 180888997 (<.01%)
cycles in affected programs: 29979 -> 29949 (-0.10%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5
helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07%
95% mean confidence interval for cycles value: -13.40 1.40
95% mean confidence interval for cycles %-change: -0.27% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total cycles in shared programs: 155091021 -> 155091003 (<.01%)
cycles in affected programs: 8842 -> 8824 (-0.20%)
helped: 2
HURT: 0

No changes on Iron Lake or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick 403aac7500 nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel
All of the affected shaders are in Mad Max.  I noticed this while
looking at some other things.  I tried a couple similar patterns, but
the affect on cycles was general negative.  It may be worth revisiting
this later.

v2: Rebase on 1-bit Boolean changes.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.

total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick 25bfba3335 nir/algebraic: Recognize open-coded copysign(1.0, a)
All of the affected shaders are in Mad Max.  The inner part of the
pattern is itself an open-coded sign(a).  I tried using that as a
pattern, but the results were not good.  A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.

v2: Rebase on 1-bit Boolean changes.

v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.

total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Jason Ekstrand c6463f8ac2 nir: Add a nir_src_as_intrinsic() helper
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-18 17:12:44 +00:00
Jason Ekstrand 85c35885b3 nir: Rework nir_src_as_alu_instr to not take a pointer
Other nir_src_as_* functions just take a nir_src.  It's not that much
more memory copying and the constness preserving really isn't worth the
cognitive dissonance.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-18 17:12:44 +00:00
Jason Ekstrand eee994e769 nir: Drop "struct" from some nir_* declarations
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-18 17:12:44 +00:00
Iago Toral Quiroga e6ee07a664 compiler/spirv: move the check for Int8 capability
So it is right after the checks for the other various Int* capabilities.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 13:23:03 +02:00
Caio Marcelo de Oliveira Filho a0dae78e72 spirv: Tell which opcode or value is unhandled when failing
v2: When available, include the opcode name too. (Karol)

v3: Use more to_string helpers. (Karol)
    Include the wrong bit_size in those failures.
    Include the capability number in spv_check_supported.
    Provide vtn_fail_with_* macros to avoid noise in the call sites.

v4: Provide macros only for opcode and decoration, which have enough
    usages to justify them. (Jason)

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-16 11:11:10 -07:00
Caio Marcelo de Oliveira Filho 0ccfe741b1 spirv: Add more to_string helpers
Also, use a set to identify repeated values.  The previous arrangement
worked when the repetitions were one after another, but in some of the
new cases they are not.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-16 11:11:10 -07:00
Jason Ekstrand ba0f203ae8 nir/algebraic: Use a cache to avoid re-emitting structs
This takes the stupid simplest and most reliable approach to reducing
redundancy that I could come up with:  Just use the struct declaration
as the cach key.  This cuts the size of the generated C file to about
half and takes about 50 KiB off the .data section.

size before (release build):

   text	   data	    bss	    dec	    hex	filename
5363833	 336880	  13584	5714297	 573179	_install/lib64/libvulkan_intel.so

size after (release build):

   text	   data	    bss	    dec	    hex	filename
5229017	 285264	  13584	5527865	 545939	_install/lib64/libvulkan_intel.so

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-04-16 16:40:15 +00:00
Jason Ekstrand 0c712fd404 nir/algebraic: Move the template closer to the render function
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-04-16 16:40:15 +00:00
Marek Olšák d3ce8a7f6b nir: optimize gl_SampleMaskIn to gl_HelperInvocation for radeonsi when possible
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-16 10:24:19 -04:00
Tapani Pälli 624789e370 compiler/glsl: handle case where we have multiple users for types
Both Vulkan and OpenGL might be using glsl_types simultaneously or we
can also have multiple concurrent Vulkan instances using glsl_types.
Patch adds a one time init to track number of users and will release
types only when last user calls _glsl_type_singleton_decref().

This change fixes glsl_type memory leaks we have with anv driver.

v2: reuse hash_mutex, cleanup, apply fix also to radv driver and
    rename helper functions (Jason)

v3: move init, destroy to happen on GL context init and destroy

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-16 12:58:00 +03:00
Andres Gomez 42351c21bb glsl/linker: always validate explicit locations for first and last interfaces
Until now, we were only doing this when linking a SSO
program. However, nothing avoids linking a non SSO program which
doesn't have both a VS and FS. In those cases, we also need to report
the usual linking errors, if happening.

v2: Use a better name for the renamed function (Timothy).

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-15 22:34:50 +00:00
Dylan Baker 95aefc94a9 Delete autotools
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-04-15 13:44:29 -07:00
Rhys Perry 082d180a22 mesa, glsl: add support for EXT_shader_image_load_formatted
v3: rebase

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-04-15 16:18:07 -04:00
Rhys Perry 8671cfe2a2 nir,ac/nir: fix cube_face_coord
Seems it was missing the "/ ma + 0.5" and the order was swapped.

Fixes: a1a2a8dfda ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-15 17:22:47 +01:00
Timothy Arceri 8f74a60c43 nir: fix packing components with arrays
When gathering info for unmovable types we need to handle arrays.
While we dont support packing/moving arrays we do support packing
scalar components with these arrays.

Fixes piglit:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test

Fixes: 5eb17506e1 ("nir: do not pack varying with different types")

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-15 19:25:12 +10:00
Samuel Pitoiset bbe8febd93 spirv: add SpvCapabilityFloat16 support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-15 10:43:52 +02:00
Jason Ekstrand 47709ca146 nir/validate: Require unused bits of nir_const_value to be zero
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Jason Ekstrand c4b28d1730 nir/load_const_to_scalar: Get rid of a bit size switch statement
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to pluck off components properly.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Jason Ekstrand 893dd34702 spirv: Drop some unneeded bit size switch statements
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order copy components around properly.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Jason Ekstrand b8197a01a9 nir/constant_folding: Get rid of a bit size switch statement
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to swizzle them properly.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Karol Herbst 14531d676b nir: make nir_const_value scalar
v2: remove & operator in a couple of memsets
    add some memsets
v3: fixup lima

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2019-04-14 22:25:56 +02:00
Karol Herbst 73d883037d spirv: reduce array size in vtn_handle_constant
we already assert above that there are no more than 3 sources, so it
doesn't make sense to use an array of 4 sources

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Karol Herbst e72beacb95 nir/loop_analyze: use nir_const_value.b for boolean results, not u32
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Jason Ekstrand 10602db78c nir/print: Use nir_src_as_int for array indices
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Jason Ekstrand 9b1e4bab6b nir/builder: Add a nir_imm_zero helper
v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Karol Herbst daaf777376 nir/builder: Move nir_imm_vec2 from blorp into the builder
While we're here, fix a typo which caused it to actually return a vec4
with the third and fourth components zero.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Alyssa Rosenzweig 2ce4adefa5 nir: Add nir_lower_viewport_transform
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.

v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!

v3: Bikeshed continues.

v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment.

Ian and Qiang's reviews are from v3, but no real functional changes from
v4. Rob's review is from v4.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-04-14 19:15:13 +00:00
Christian Gmeiner b6bed115a5 nir: add lower_ftrunc
Port TGSI TRUNC lowering to nir

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-13 17:54:48 +00:00
Jason Ekstrand 18ed82b084 nir: Add a pass for selectively lowering variables to scratch space
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space.  It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.

v2: Drop const_index comments (by anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-12 15:59:31 -07:00
Eric Anholt b88ef3bd76 nir: Add a comment about how intrinsic definitions work.
I was thinking about a refactor, and needed to read this first.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:56:12 -07:00
Eric Anholt 35355b4860 nir: Drop remaining references to const_index in favor of the call to use.
Please don't make me read a const_index[] expression ever again.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:56:04 -07:00
Eric Anholt 6e4d3d0a2f nir: Drop comments about the constant_index slots for load/stores.
The constant_index slots are named right there in the intrinsic
definition, and the comment is just a chance to get out of sync.  Noticed
while reviewing the lower_to_scratch changes that copy-and-pasted wrong
comments, and load_ubo and load_per_vertex_output had incorrect comments
currently.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:55:55 -07:00
Kenneth Graunke 9e0c744f07 glsl: Set location on structure-split sampler uniform variables
gl_nir_lower_samplers_as_deref splits structure uniform variables,
creating new variables for individual fields.  As part of that, it
calculates a new location.  It then never set this on the new variables.

Thanks to Michael Fiano for finding this bug.  Fixes crashes on i965
with Piglit's new tests/spec/glsl-1.10/execution/samplers/uniform-struct
test, which was reduced from the failing case in Michael's app.

Fixes: f003859f97 nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-12 10:35:08 -07:00
Marek Olšák bd2995c8b7 glsl: allow the #extension directive within code blocks for the dri option
for Viewperf 13

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-12 11:34:39 -04:00
Karol Herbst 4a3c04a11f glsl/nir: add support for lowering bindless images_derefs
v2: handle atomics as well
    make use of nir_rewrite_image_intrinsic
v3: remove call to nir_remove_dead_derefs
v4: (Timothy Arceri) dont actually call lowering yet

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Karol Herbst 0b2e8d9e17 glsl/nir: fetch the type for images from the deref instruction
fixes retrieving the sampler type for bindless images stored inside structs.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Karol Herbst d7bbb3caf1 glsl_to_nir: handle bindless textures
v2: add support for AMD

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Timothy Arceri 035759b61b nir/i965/freedreno/vc4: add a bindless bool to type size functions
This required to calculate sizes correctly when we have bindless
samplers/images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Karol Herbst 3b2a9ffd60 nir: move brw_nir_rewrite_image_intrinsic into common code
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Timothy Arceri 9e3740c47f nir: initialise some variables in opt_if_loop_last_continue()
Fixes a couple of Coverity warnings CID 1444626.

Fixes: e30804c602 ("nir/radv: remove restrictions on opt_if_loop_last_continue()")

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-04-11 20:38:03 +10:00
Juan A. Suarez Romero 83f1b0e95b nir/xfb: do not use bare interface type
In commit 3b3653c4cf we decided not to use bare types; hence do not use
bare type when comparing with interface type to find out if the xfb
variable is an array block.

This fixes dEQP-VK.transform_feedback.* tests.

Fixes: 3b3653c4cf ("nir/spirv: don't use bare types, remove assert in
                     split vars for testing")
CC: Dave Airlie <airlied@redhat.com>
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-11 11:52:45 +02:00
Marek Olšák 53f715fafb Revert "glsl: fix shader_storage_blocks_write_access for SSBO block arrays"
This reverts commit b7ca074cc0.

It broke a lot of tests.
2019-04-10 10:48:56 -04:00
Karol Herbst 0c4706563a glsl/standalone: add GLES3.1 and GLES3.2 compatibility
also set some constants for SSBOs.

With that it can compile the shader from:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-10 16:16:36 +02:00
Bas Nieuwenhuizen 282bacab4a nir: Add access qualifiers on load_ubo intrinsic.
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.

Fixes: 8ed583fe52 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f2 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-10 02:04:04 +02:00
Marek Olšák b7ca074cc0 glsl: fix shader_storage_blocks_write_access for SSBO block arrays
CTS: GL45-CTS.compute_shader.resources-max

Fixes: 4e1e8f684b "glsl: remember which SSBOs are not read-only and pass it to gallium"

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-09 19:25:35 -04:00
Andres Gomez 75a3dd97aa glsl/linker: location aliasing requires types to have the same width
From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers,
Page 67, (Location aliasing):

  " Further, when location aliasing, the aliases sharing the location
    must have the same underlying numerical type and bit
    width (floating-point or integer, 32-bit versus 64-bit, etc.) and
    the same auxiliary storage and interpolation qualification."

Additionally, we have improved the linker error descriptions.
Specifically, when taking structs into account we were producing a
linker error because we assumed that all components in each location
were used and that would cause component aliasing. This is not
accurate of the actual problem. Now, the failure specifies that the
underlying numerical type incompatibility is the cause for the
failure.

Fixes the following piglit test:

tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test

v2:
  - Do not assert if we see invalid numerical types. These come
    straight from shader code, so we should produce linker errors if
    shaders attempt to do location aliasing on variables that are not
    numerical such as records.
  - While we are at it, improve error reporting for the case of
    numerical type mismatch to include the shader stage.

v3:
  - Allow location aliasing of images and samplers. If we get these
    it means bindless support is active and they should be handled
    as 64-bit integers (Ilia)
  - Make sure we produce link errors for any non-numerical type
    for which we attempt location aliasing, not just structs.

v4:
  - Rebased with minor fixes (Andres).
  - Added fixing tag to the commit log (Andres).

v5:
  - Remove the helper function and check individually for the
    underlying numerical type and bit width (Timothy).
  - Implicitly, assume that any non-treated type which is checked for
    its underlying numerical type is either integer or
    float and has a defined bit width (Timothy).
  - Implicitly, assume that structs are the only non-treated
    non-numerical type (Timothy).
  - Improve the linker error descriptions and commit log (Andres).

Fixes: 13652e7516 ("glsl/linker: Fix type checks for location aliasing")
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-09 12:56:50 +02:00
Jason Ekstrand 6279074de1 nir: Get rid of global registers
We have a pass to lower global registers to locals and many drivers
dutifully call it.  However, no one ever creates a global register ever
so it's all dead code.  It's time we bury it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Jason Ekstrand b28bad89b9 nir: Get rid of nir_register::is_packed
All we ever do is initialize it to zero, clone it, print it, and
validate it.  No one ever sets or uses it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Caio Marcelo de Oliveira Filho bd73531677 spirv: Add support for DerivativeGroup capabilities
As defined in SPV_NV_compute_shader_derivatives. These control how the
invocations are arranged in a CS when doing derivative and related
operations (which are also enabled by the extension).

Since we expect valid SPIR-V, we don't need to do more work at SPIR-V
level to enable the derivative and related operations to be called.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho fcbc5ccaae nir: Don't set LOD=0 for compute shader that has derivative group
When using NV_compute_shader_derivatives to set a derivative group,
a compute shader supports texture with implicit LOD calculation, so
don't set an explicit LOD.

Note if the extension is used but the derivative group is not
specified, it will default to LOD=0 as before.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho d08a74d2bf nir/algebraic: Lower CS derivatives to zero when no group defined
In compute shaders if no derivative group is defined, the derivatives
will always be zero.  Specified in NV_compute_shader_derivatives.

To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules.  (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho 3c5ddaeacd glsl: Parse and propagate derivative_group to shader_info
NV_compute_shader_derivatives allow selecting between two possible
arrangements (quads and linear) when calculating derivatives and
certain subgroup operations in case of Vulkan.  So parse and propagate
those up to shader_info.h.

v2: Do not fail when ARB_compute_variable_group_size is being used,
    since we are still clarifying what is the right thing to do here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho ca60f0b7ba glsl: Enable texture builtins for NV_compute_shader_derivatives
Renamed a few predicates from "fs_only" to be "derivative_only" (or
similar pairs).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho 09a3273fe7 glsl: Enable derivative builtins for NV_compute_shader_derivatives
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho 289478ea89 glsl: Remove redundant conditions when asserting in_qualifier
As the code evolved, we ended up with a redundant conditions.  Clean
this up.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho 163655b33e mesa: Extension boilerplate for NV_compute_shader_derivatives
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Timothy Arceri e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Jason Ekstrand 50f3535d1f nir/search: Search for all combinations of commutative ops
Consider the following search expression and NIR sequence:

    ('iadd', ('imul', a, b), b)

    ssa_2 = imul ssa_0, ssa_1
    ssa_3 = iadd ssa_2, ssa_0

The current algorithm is greedy and, the moment the imul finds a match,
it commits those variable names and returns success.  In the above
example, it maps a -> ssa_0 and b -> ssa_1.  When we then try to match
the iadd, it sees that ssa_0 is not b and fails to match.  The iadd
match will attempt to flip itself and try again (which won't work) but
it cannot ask the imul to try a flipped match.

This commit instead counts the number of commutative ops in each
expression and assigns an index to each.  It then does a loop and loops
over the full combinatorial matrix of commutative operations.  In order
to keep things sane, we limit it to at most 4 commutative operations (16
combinations).  There is only one optimization in opt_algebraic that
goes over this limit and it's the bitfieldReverse detection for some UE4
demo.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15310125 -> 15302469 (-0.05%)
    instructions in affected programs: 1797123 -> 1789467 (-0.43%)
    helped: 6751
    HURT: 2264

    total cycles in shared programs: 357346617 -> 357202526 (-0.04%)
    cycles in affected programs: 15931005 -> 15786914 (-0.90%)
    helped: 6024
    HURT: 3436

    total loops in shared programs: 4360 -> 4360 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 23675 -> 23666 (-0.04%)
    spills in affected programs: 235 -> 226 (-3.83%)
    helped: 5
    HURT: 1

    total fills in shared programs: 32040 -> 32032 (-0.02%)
    fills in affected programs: 190 -> 182 (-4.21%)
    helped: 6
    HURT: 2

    LOST:   18
    GAINED: 5

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-04-08 21:38:48 +00:00
Jason Ekstrand ad8c145658 nir/algebraic: Add some logical OR and AND patterns
The new OR pattern has been seen in the wild and can end up being
generated by GLSLang.  Not sure about the other two new patterns but we
may as well throw them in for completeness.  While we're here, we can
drop the '@bool' specifier from the one pattern because specifying True
already implies 1-bit which basically implies boolean.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15321227 -> 15321129 (<.01%)
    instructions in affected programs: 3594 -> 3496 (-2.73%)
    helped: 6
    HURT: 0

    total cycles in shared programs: 357481321 -> 357479725 (<.01%)
    cycles in affected programs: 44109 -> 42513 (-3.62%)
    helped: 6
    HURT: 0

VkPipeline-DB results on Kaby Lake:

    total instructions in shared programs: 3770504 -> 3769734 (-0.02%)
    instructions in affected programs: 19058 -> 18288 (-4.04%)
    helped: 163
    HURT: 0

    total cycles in shared programs: 1417583701 -> 1417569727 (<.01%)
    cycles in affected programs: 750958 -> 736984 (-1.86%)
    helped: 158
    HURT: 1

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:06 -05:00
Jason Ekstrand 03a72d96d8 nir/algebraic: Drop some @bool specifiers
Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time.  We can drop these specifiers and now the optimizations
will apply more generally.

Shader-DB results on Kaby Lake:

    total instructions in shared programs: 15321168 -> 15321227 (<.01%)
    instructions in affected programs: 8836 -> 8895 (0.67%)
    helped: 1
    HURT: 31

    total cycles in shared programs: 357481781 -> 357481321 (<.01%)
    cycles in affected programs: 146524 -> 146064 (-0.31%)
    helped: 22
    HURT: 10

    total spills in shared programs: 23675 -> 23673 (<.01%)
    spills in affected programs: 11 -> 9 (-18.18%)
    helped: 1
    HURT: 0

    total fills in shared programs: 32040 -> 32036 (-0.01%)
    fills in affected programs: 27 -> 23 (-14.81%)
    helped: 1
    HURT: 0

No change in VkPipeline-DB

Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb.  In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL.  Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win?  Also it helps spilling in one Car Chase compute shader.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:00 -05:00
Caio Marcelo de Oliveira Filho c037dbb0ef nir: Take if_uses into account when repairing SSA
If a def is used as an condition before its definition, we should also
consider this a case to repair.  When repairing, make sure we rewrite
any if conditions too.

Found in while inspecting a SPIR-V conversion from a 'continue block'
that contains a conditional branch.  We pull the continue block up to
the beggining of the loop, and the condition in the branch ends up
defined afterwards.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 364212f1ed "nir: Add a pass to repair SSA form"
2019-04-05 09:43:46 -07:00
Samuel Pitoiset 5eb17506e1 nir: do not pack varying with different types
The current algorithm only supports packing 32-bit types.
If a shader uses both 16-bit and 32-bit varyings, we shouldn't
compact them together.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 13:57:42 +02:00
Sergii Romantsov a7d40a13ec glsl: Fix input/output structure matching across shader stages
Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.30 spec says:

    "Variables or block members declared as structures are considered
     to match in type if and only if structure members match in name,
     type, qualification, and declaration order."

Fixes:
     * layout-location-struct.shader_test

v2: rebased against master and small fixes

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108250
2019-04-05 11:02:23 +11:00
Marek Olšák 4e1e8f684b glsl: remember which SSBOs are not read-only and pass it to gallium
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-04 19:28:52 -04:00
Alyssa Rosenzweig a83862754e nir: Add "viewport vector" system values
While a partial set of viewport system values exist, these are scalar
values, which is a poor fit for viewport transformations on vector ISAs
like Midgard (where the vec3 values for scale and offset each need to be
coherent in a vec4 uniform slot to take advantage of vectorized
transform math). This patch adds vec3 scale/offset fields corresponding
to the 3D Gallium viewport / glViewport+depth

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-04 03:44:09 +00:00
Dave Airlie eb8fefe090 nir: use proper array sizing define for vectors
If we increase the vector size in the future it would be good
to not have to fix these up, this should change nothing at present.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-03 13:59:06 +10:00
Timothy Arceri d8ce915a61 Revert "nir: propagate known constant values into the if-then branch"
This reverts commit 4218b6422c.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110311
2019-04-03 13:24:18 +11:00
Timothy Arceri 4218b6422c nir: propagate known constant values into the if-then branch
Helps Max Waves / VGPR use in a bunch of Unigine Heaven
shaders.

shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 5505440 -> 5505872 (0.01 %)
VGPRS: 3077520 -> 3077296 (-0.01 %)
Spilled SGPRs: 39032 -> 39030 (-0.01 %)
Spilled VGPRs: 16326 -> 16326 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 744 -> 744 (0.00 %) dwords per thread
Code Size: 123755028 -> 123753316 (-0.00 %) bytes
Compile Time: 2751028 -> 2560786 (-6.92 %) milliseconds
LDS: 1415 -> 1415 (0.00 %) blocks
Max Waves: 972192 -> 972240 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

vkpipeline-db results RADV (VEGA):

Totals from affected shaders:
SGPRS: 160 -> 160 (0.00 %)
VGPRS: 88 -> 88 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 18268 -> 18152 (-0.63 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 26 -> 26 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-03 10:04:48 +11:00
Tapani Pälli 06f40f5765 spirv: fix a compiler warning
Fixes implicit conversion from enumeration type 'SpvOp' warning.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-01 07:43:10 +03:00
Rob Clark 1ae0c030cb nir: add lower_all_io_to_elements
I need this part of lower_all_io_to_temps but without the actual
lowering to temps part.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark e5e67228f5 nir: print var name for load_interpolated_input too
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
2019-03-30 12:55:47 -04:00
Jason Ekstrand 7dbd934e26 nir: Lock around validation fail shader dumping
This prevents getting mixed-up results if a multi-threaded app has two
validation errors in different threads.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-29 21:57:51 -05:00
Karol Herbst fea0caea2b nir/validate: validate that tex deref sources are actually derefs
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-29 16:03:22 +01:00
Karol Herbst 6ffc72472c nir/print: fix printing the image_array intrinsic index
Fixes: 0de003be03 ("nir: Add handle/index-based image intrinsics")

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-29 16:03:22 +01:00
Brian Paul 4ee057eaef nir: use {0} initializer instead of {} to fix MSVC build
Trivial change.

Fixes: c6ee46a75 ("nir: Add nir_alu_srcs_negative_equal")
2019-03-28 20:34:23 -06:00
Ian Romanick 2cf59861a8 nir: Add partial redundancy elimination for compares
This pass attempts to dectect code sequences like

    if (x < y) {
        z = y - x;
	...
    }

and replace them with sequences like

    t = x - y;
    if (t < 0) {
        z = -t;
	...
    }

On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction.  It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.

Currently only floating point compares and adds are supported.  Adding
support for integer will be a challenge due to integer overflow.  There
are a couple possible solutions, but they may not apply to all
architectures.

v2: Fix a typo in the commit message and a couple typos in comments.
Fix possible NULL pointer deref from result of push_block().  Add
missing (-A + B) case.  Suggested by Caio.

v3: Fix is_not_const_zero to work correctly with types other than
nir_type_float32.  Suggested by Ken.

v4: Add some comments explaining how this works.  Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:53 -07:00
Ian Romanick c6ee46a753 nir: Add nir_alu_srcs_negative_equal
v2: Move bug fix in get_neg_instr from the next patch to this patch
(where it was intended to be in the first place).  Noticed by Caio.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:52 -07:00
Ian Romanick be1cc3552b nir: Add nir_const_value_negative_equal
v2: Rebase on 1-bit Boolean changes.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:52 -07:00
Ian Romanick ae21b52e1d nir/algebraic: Add missing 16-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Ian Romanick cbad201c2b nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Ian Romanick bc17f5a2a3 nir/algebraic: Remove redundant extract_[iu]8 patterns
No shader-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-28 15:35:52 -07:00
Ian Romanick c152672e68 nir/algebraic: Fix up extract_[iu]8 after loop unrolling
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256837 (<.01%)
instructions in affected programs: 4713 -> 4710 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06%

total cycles in shared programs: 372286583 -> 372286583 (0.00%)
cycles in affected programs: 198516 -> 198516 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01%

No changes on any other Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Dave Airlie b779baa9bf nir/deref: fix struct wrapper casts. (v3)
llvm/spir-v spits out some struct a { struct b {} }, but it
doesn't deref, it casts (struct a) to (struct b), reconstruct
struct derefs instead of casts for these.

v2: use ssa_def_rewrite uses, rework the type restrictions (Jason)
v3: squish more stuff into one function, drop unused temp (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-29 08:10:50 +10:00
Samuel Pitoiset bea540173c spirv: propagate the access flag for store and load derefs
It was only propagated when UBO/SSBO access are lowered to offsets.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: <Jason Ekstrand jason@jlekstrand.net>
2019-03-27 09:57:30 +01:00
Samuel Pitoiset 4d0b03c83d nir: add nir_{load,store}_deref_with_access() helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: <Jason Ekstrand jason@jlekstrand.net>
2019-03-27 09:57:27 +01:00
Timothy Arceri d163780f81 spirv: make use of the select control support in nir
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
2019-03-27 02:39:12 +00:00
Timothy Arceri e76ae39ae2 nir: add support for user defined select control
This will allow us to make use of the selection control support in
spirv and the GL support provided by EXT_control_flow_attributes.

Note this only supports if-statements as we dont support switches
in NIR.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
2019-03-27 02:39:12 +00:00
Timothy Arceri 24037ff228 spirv: make use of the loop control support in nir
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
2019-03-27 02:39:12 +00:00
Timothy Arceri b56451f82c nir: add support for user defined loop control
This will allow us to make use of the loop control support in
spirv and the GL support provided by EXT_control_flow_attributes.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
2019-03-27 02:39:12 +00:00
Jason Ekstrand 8ed583fe52 spirv: Handle the NonUniformEXT decoration 2019-03-25 16:12:09 -05:00
Jason Ekstrand e50ab2c0f2 nir: Add access flags to deref and SSBO atomics
We will need them for a new ACCESS_NON_UNIFORM flag that's about to be
added in the next commit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-25 16:12:09 -05:00
Jason Ekstrand 40074ebf74 nir: Add texture sources and intrinsics for bindless
On Intel, we have both bindless and bindful and we'd like to use them at
the same time if we can so we need to be able to distinguish at the NIR
level between the two.  This also fixes nir_lower_tex to properly handle
bindless in its tex_texture_size and get_texture_lod helpers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-25 16:12:09 -05:00
Jason Ekstrand 3bd5457641 nir: Add a lowering pass for non-uniform resource access
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-25 15:00:36 -05:00
Jason Ekstrand 39da1deb49 nir/lower_io: Add a bounds-checked 64-bit global address format
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-25 14:40:54 -05:00
Iago Toral Quiroga 763c8aabed compiler/nir: add lowering for 16-bit ldexp
v2 (Topi):
 - Make bit-size handling order be 16-bit, 32-bit, 64-bit
 - Clamp lower exponent range at -28 instead of -30.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Iago Toral Quiroga 3766334923 compiler/nir: add lowering for 16-bit flrp
And enable it on Intel.

v2:
 - Squash the change to enable it on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Iago Toral Quiroga ca31df6f1f compiler/nir: add lowering option for 16-bit fmod
And enable it on Intel.

v2:
 - Squash the change to enable this lowering on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Brian Paul d13167cd21 nir: fix a few signed/unsigned comparison warnings
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 06:51:31 -06:00
Dave Airlie 9417793fb1 nir/split_vars: fixup some more explicit_stride related issues.
With vkpipelinedb Samuel discovered a regression since we stopped
stripping types at the spir-v level.

This adds a check to the var splitting for the case where it
asserts the type hasn't changed, when it has just created a bare
type, and it's different than the original type which has an explicit
stride.

This also removes a pointless assert that also triggers.

Fixes: 3b3653c4cf (nir/spirv: don't use bare types, remove assert in split vars for testing)

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 13:57:16 +10:00
Caio Marcelo de Oliveira Filho 9d0ae777dd spirv: Use interface type for block and buffer block
Also handle GLSL_TYPE_INTERFACE the same way we do GLSL_TYPE_STRUCT in
various places.  Motivated by ARB_gl_spirv work, that will take
advantage of the interface types when handling NIR coming from SPIR-V.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-23 10:22:39 -07:00
Caio Marcelo de Oliveira Filho 15012077bc spirv: Add an execution environment to the options
Also updates gl_spirv to pick the right one.  At the moment nothing
uses it, but upcoming functionality part of ARB_gl_spirv will use it,
and we also later can be more assertful when handling certain features
for each of the execution environments.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
2019-03-23 09:29:21 -07:00
Caio Marcelo de Oliveira Filho e5830e1132 nir: Handle array-deref-of-vector case in loop analysis
SPIR-V can produce those for SSBO and UBO access.  Found when testing
the ARB_gl_spirv series.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-22 13:50:39 -07:00
Samuel Pitoiset 23d30f4099 spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.

While we are at it, factorize a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:46 +01:00
Samuel Pitoiset 6ae5797243 nir: use generic float types for frexp_exp and frexp_sig
Only the exponent needs to be 32-bit signed integer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:44 +01:00
Vinson Lee 77aa11ca32 nir: Fix anonymous union initialization with older GCC.
Fix this build error with GCC 4.4.7.

  CC     nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘load_element_from_ssa_entry_value’:
nir/nir_opt_copy_prop_vars.c:454: error: unknown field ‘ssa’ specified in initializer
nir/nir_opt_copy_prop_vars.c:455: error: unknown field ‘def’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: unknown field ‘component’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: extra brace group at end of initializer
nir/nir_opt_copy_prop_vars.c:456: error: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:456: warning: excess elements in union initializer
nir/nir_opt_copy_prop_vars.c:456: warning: (near initialization for ‘(anonymous).<anonymous>’)

Fixes: 96c32d7776 ("nir/copy_prop_vars: handle load/store of vector elements")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109810
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-22 10:43:41 -07:00
Danylo Piliaiev ea9bde151f
glsl: Cross validate variable's invariance by explicit invariance only
'invariant' qualifier is propagated on variables which are used
to calculate other invariant variables, however when we are matching
variable's declarations we should take into account only explicitly
declared invariance because invariance propagation is an implementation
specific detail.

Thus new flag is added to ir_variable_data which indicates 'invariant'
qualifier being explicitly set in the shader.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100316
Fixes: 89b60492 ('glsl: Add a pass to propagate the "invariant" and
  "precise" qualifiers')

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-03-21 23:28:08 -07:00
Timothy Arceri a1bd9dd5bc nir: fix opt_if_loop_last_continue()
Rather than skipping code that looked like this:

     loop {
        ...
        if (cond) {
           do_work_1();
           continue;
        } else {
           break;
        }
        do_work_2();
     }

Previously we would turn this into:

     loop {
        ...
        if (cond) {
           do_work_1();
           continue;
        } else {
           do_work_2();
           break;
        }
     }

This was clearly wrong. This change checks for this case and makes
sure we now leave it for nir_opt_dead_cf() to clean up.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-22 09:58:18 +11:00
Kenneth Graunke e426c3a6cb nir: Record non-vector/scalar varyings as unmovable when compacting
In some cases, we can end up with varying structs that aren't split to
their member variables.  nir_compact_varyings attempted to record these
as unmovable, so it would leave them be.  Unfortunately, it didn't do
it right for non-vector/scalar types.  It set the mask to:

   ((1 << (elements * dmul)) - 1) << var->data.location_frac

where elements is the number of vector elements.  For structures and
other non-vector/scalars, elements is 0...so the whole mask became 0.

This caused nir_compact_varyings to assign other varyings on top of
the structure varying's location (as it appeared to take up no space).

To combat this, we just set elements to 4 for non-vector/scalar types,
so that the entire slot gets marked as unmovable.

Fixes KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in on iris.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-21 16:03:58 +00:00
Rob Clark d4cbc94685 nir: move gls_type_get_{sampler,image}_count()
I need at least the sampler variant in ir3..

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-21 09:13:05 -04:00
Timothy Arceri 427a6fee43 nir: only override previous alu during loop analysis if supported
Users of this function expect alu to be a supported comparision
if the induction variable is not NULL. Since we attempt to
override the return values if the first limit is not a const, we
must make sure we are dealing with a valid comparision before
overriding the alu instruction.

Fixes an unreachable in inverse_comparison() with the game
Assasins Creed Odyssey.

Fixes: 3235a942c1 ("nir: find induction/limit vars in iand instructions")

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110216
2019-03-21 21:51:21 +11:00
Jason Ekstrand 6e19348ad1 spirv: Drop inline tg4 lowering
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Karol Herbst d8a0658d8b nir/lower_tex: Add support for tg4 offsets lowering
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Karol Herbst 71c66c254b nir: add support for gather offsets
Values inside the offsets parameter of textureGatherOffsets are required to be
constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET,
GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET].

As this range is never outside [-32, 31] for all existing drivers inside mesa,
we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr.

Right now only Nvidia hardware supports this in hardware, so we can turn this
on inside Nouveau for the NIR path as it is already enabled with the TGSI one.

v2: use memcpy instead of for loops
    add missing bits to nir_instr_set
    don't show offsets if they are all 0
v3: default offsets aren't all 0
v4: rename offsets -> tg4_offsets
    rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets

Signed-off-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Dave Airlie b95b33a5c7 nir/deref: remove casts of casts which are likely redundant (v3)
Not sure how ptr_stride should be taken into account if at all here

v2: reorder check to avoid src walking (Jason)
v3: remove is_cast_cast checks, keep going afterwards (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-21 10:58:06 +10:00
Dave Airlie 3b3653c4cf nir/spirv: don't use bare types, remove assert in split vars for testing
For OpenCL we never want to strip the info from the types, and it makes
type comparisons easier in later stages. We might later need a nir pass to
strip this for GLSL, but so far the only regression is the assert and Jason
said removing that is fine.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-03-21 10:25:40 +10:00
Juan A. Suarez Romero efcf9c9f9f nir: deref only for OpTypePointer
Fixes dEQP-VK.binding_model.buffer_device_address.* and
dEQP-VK.ssbo.phys.layout* Vulkan CTS tests.

v2: set val->type->stride in the section below (Jason)

v3: restore val->type->type to original place (Jason)

Fixes: d0ba326f23 ("nir/spirv: support physical pointers")
CC: Karol Herbst <kherbst@redhat.com>
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-20 19:26:32 +00:00
Jason Ekstrand 0b7e5bdbd4 nir: Constant values are per-column not per-component
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-20 09:26:56 -05:00
Andres Gomez ab28dca033 Revert "glsl: relax input->output validation for SSO programs"
This reverts commit 1aa5738e66.

This patch incorrectly asumed that for SSOs no inner interface
matching check was needed.

From the ARB_separate_shader_objects spec v.25:

  " With separable program objects, interfaces between shader stages
    may involve the outputs from one program object and the inputs
    from a second program object.  For such interfaces, it is not
    possible to detect mismatches at link time, because the programs
    are linked separately.  When each such program is linked, all
    inputs or outputs interfacing with another program stage are
    treated as active.  The linker will generate an executable that
    assumes the presence of a compatible program on the other side of
    the interface.  If a mismatch between programs occurs, no GL error
    will be generated, but some or all of the inputs on the interface
    will be undefined."

This completes the fix from commit:
3be05dd267 ("glsl/linker: don't fail non static used inputs without matching outputs")

Fixes: 1aa5738e66 ("glsl: relax input->output validation for SSO programs")
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-19 17:36:20 +02:00
Andres Gomez 422882e78f glsl/linker: simplify xfb_offset vs xfb_stride overflow check
Current implementation uses a complicated calculation which relies in
an implicit conversion to check the integral part of 2 division
results.

However, the calculation actually checks that the xfb_offset is
smaller or a multiplier of the xfb_stride. For example, while this is
expected to fail, it actually succeeds:

  "

    ...

    layout(xfb_buffer = 2, xfb_stride = 12) out block3 {
      layout(xfb_offset = 0) vec3 c;
      layout(xfb_offset = 12) vec3 d; // ERROR, requires stride of 24
    };

    ...

  "

Fixes: 2fab85aaea ("glsl: add xfb_stride link time validation")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-19 17:23:27 +02:00
Andres Gomez 3be05dd267 glsl/linker: don't fail non static used inputs without matching outputs
If there is no Static Use of an input variable, the linker shouldn't
fail whenever there is no defined matching output variable in the
previous stage.

From page 47 (page 51 of the PDF) of the GLSL 4.60 v.5 spec:

  " Only the input variables that are statically read need to be
    written by the previous stage; it is allowed to have superfluous
    declarations of input variables."

Now, we complete this exception whenever the input variable has an
explicit location. Previously, 18004c338f ("glsl: fail when a
shader's input var has not an equivalent out var in previous") took
care of the cases in which the input variable didn't have an explicit
location.

v2: do the location based interface matching check regardless on
    whether it is a separable program or not (Ilia).

Fixes: 1aa5738e66 ("glsl: relax input->output validation for SSO programs")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-19 17:23:27 +02:00
Andres Gomez de1bc2d19a glsl/linker: always validate explicit location among inputs
Outputs are always validated when having explicit locations and we
were trusting its outcome to catch similar problems with the inputs
since, in case of having undefined outputs for existing inputs, we
would be already reporting a linker error.

However, consider this case:

  " Shader stage n:
    ---------------

    ...

    layout(location = 0) out float a;

    ...

    Shader stage n+1:
    -----------------

    ...

    layout(location = 0) in float b;
    layout(location = 0) in float c;

    ...
  "

Currently, this won't report a linker error even though location
aliasing is happening for the inputs.

Therefore, we also need to validate the inputs independently from the
outcome of the outputs validation.

Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-19 17:23:27 +02:00
Andres Gomez a96093136b glsl: correctly validate component layout qualifier for dvec{3,4}
From page 62 (page 68 of the PDF) of the GLSL 4.50 v.7 spec:

  " A dvec3 or dvec4 can only be declared without specifying a
    component."

Therefore, using the "component" qualifier with a dvec3 or dvec4
should result in a compiling error.

v2: enhance the error message (Timothy).

Fixes: 94438578d2 ("glsl: validate and store component layout qualifier in GLSL IR")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-19 17:23:27 +02:00
Jason Ekstrand cbfe31ccbe Revert "nir: const `nir_call_instr::callee`"
This reverts commit db57db5317.  When
building IR, nothing is really immutable and, since C has no concept of
constness propagating beyond the first pointer, we have to be vary
careful with how we use it.  To just throw const into a function like
this is a lie.

Instead, we should just drop the unneeded const in spirv_to_nir which
this commit does along with the revert.
2019-03-19 10:19:42 -05:00
Eric Engestrom db57db5317 nir: const `nir_call_instr::callee`
Fixes: c95afe56a8 "nir/spirv: handle kernel function parameters"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
2019-03-19 12:51:53 +00:00
Karol Herbst d0ba326f23 nir/spirv: support physical pointers
v2: add load_kernel_input

Signed-off-by: Karol Herbst <kherbst@redhat.com>

squash! nir/spirv: support physical pointers
2019-03-19 04:08:07 +00:00
Karol Herbst c95afe56a8 nir/spirv: handle kernel function parameters
the idea here is to generate an entry point stub function wrapping around the
actual kernel function and turn all parameters into shader inputs with byte
addressing instead of vec4.

This gives us several advantages:
1. calling kernel functions doesn't differ from calling any other function
2. CL inputs match uniforms in most ways and we can just take advantage of most
   of nir_lower_io

v2: move code into a seperate function
v3: verify the entry point got a name
    fix minor typo
v4: make vtn_emit_kernel_entry_point_wrapper take the old entry point as an arg

Signed-off-by: Karol Herbst <kherbst@redhat.com>
2019-03-19 04:08:07 +00:00
Karol Herbst 0ccdf23a57 nir/lower_locals_to_regs: cast array index to 32 bit
local memory is too small to require 64 bit pointers, so cast the array index
to a 32 bit value to save up on 64 bit operations.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
2019-03-19 04:08:07 +00:00
Karol Herbst 44d32e62fb glsl: add cl_size and cl_alignment
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-03-19 04:08:07 +00:00
Karol Herbst 659f333b3a glsl: add packed for struct types
We need this for OpenCL kernels because we have to apply C rules for alignment
and padding inside structs and for this we also have to know if a struct is
packed or not.

v2: fix for kernel params

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-03-19 04:08:07 +00:00
Jason Ekstrand 35b8f6f40b nir: Add a new pass to lower array dereferences on vectors
This pass was originally written for lowering TCS output reads and
writes but it is also applicable just about anything including UBOs,
SSBOs, and shared variables.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 23:10:27 -05:00
Jason Ekstrand fe9a6c0f14 nir/builder: Add a vector extract helper
This one's a tiny bit better than what we had in spirv_to_nir because it
emits a binary tree rather than a linear walk.  It also doesn't leave
around unneeded bcsel instructions for a constant index and returns an
undef for constant OOB access.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 23:10:26 -05:00
Alejandro Piñeiro 34b3b92bbe nir/xfb: move varyings info out of nir_xfb_info
When varyings was added we moved to use to dynamycally allocated
pointers, instead of allocating just one block for everything. That
breaks some assumptions of some vulkan drivers (like anv), that make
serialization and copying easier. And at the same time, varyings are
not needed for vulkan.

So this commit moves them out. Although it seems a little an overkill,
fixing the anv side would require a similar, or more, changes, so in
the end it is about to decide where do we want to put our effort.

v2: (from Jason review)
  * Don't use a temp variable on the _create methods, just return
    result of rzalloc_size
  * Wrap some lines too long.

Fixes: cf0b2ad486 ("nir/xfb: adding varyings on nir_xfb_info and gather_info")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-15 11:59:32 +01:00
Jason Ekstrand 810dde2a6b glsl/nir: Add a pass to lower UBO and SSBO access
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 77e5ec394e glsl/nir: Handle unlowered SSBO atomic and array_length intrinsics
We didn't have any of these before because all NIR consumers always
called lower_ubo_references.  Soon, we want to pass the derefs straight
through to NIR so we need to handle these intrinsics directly.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 76ba225184 glsl/nir: Set explicit types on UBO/SSBO variables
We want to be able to use variables and derefs for UBO/SSBO access in
NIR.  In order to do this, the rest of NIR needs to know the type layout
information.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 8f3ab8aa78 glsl: Don't lower vector derefs for SSBOs, UBOs, and shared
All of these are backed by some sort of memory so if you have multiple
threads writing to different components of the same vector at the same
time, the load-vec-store pattern that GLSL IR emits won't work.  This
shouldn't affect any drivers today as they all call GLSL IR lowering
which lowers access to these variables to index+offset intrinsics before
we get to this point.  However, NIR will start handling the derefs
itself and won't want the lowering.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 3c11fc7654 nir/lower_io: Add a new buffer_array_length intrinsic and lowering
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand c8d42c8cf6 nir: Rename nir_address_format_vk_index_offset to not be vk
It's just a 32-bit index and offset.  We're going to want to use it in
GL as well so stop talking about Vulkan.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 60af3a93e9 nir/deref: Consider COHERENT decorated var derefs as aliasing
If we get to two deref_var paths with different variables, we usually
know they don't alias.  However, if both of the paths are marked
coherent, we don't have to worry about it.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 8b073832ff compiler/types: Add helpers to get explicit types for standard layouts
We also need to modify the current size/align helpers to not blow up
when they encounter an explicitly laid out type.  Previously we
considered using the size/align helpers mutually exclusive with standard
layouts but now we just assert that they match.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 5b2b144566 compiler/types: Add a C wrapper to get full struct field data
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand ef4ca44780 compiler/types: Add a new is_interface C wrapper
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand b315f6f82b nir/validate: Allow 32-bit boolean load/store intrinsics
With UBOs and SSBOs we have boolean types but they're actually 32-bit
values.  Make the validator a little less strict so that we can do a
32-bit load/store on boolean types.  We're about to add a lowering pass
called gl_nir_lower_buffers which will lower boolean load/store
operations to 32-bit and insert i2b and b2i instructions to convert
to/from 1-bit booleans.  We want that to be legal.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 5d26f2d3d5 nir/validate: Only require bare types to match for copy_deref
If we want to be able to use copy_deref instructions on explicitly laid
out types, we have to be a little more flexible about what types we
allow.  Instead, of requiring the types to exactly match, only require
the bare types to match.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand 2b76de9b5d nir/algebraic: Add a couple optimizations for iabs and ishr
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15225213 -> 15222365 (-0.02%)
    instructions in affected programs: 43524 -> 40676 (-6.54%)
    helped: 203
    HURT: 0

Lots of shaders in Shadow Warrior had this pattern along with Deus Ex,
Civ, Shadow of Mordor, and several others.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-15 01:02:19 +00:00
Eduardo Lima Mitev 6ff50a488a nir: Add ir3-specific version of most SSBO intrinsics
These are ir3 specific versions of SSBO intrinsics that add an
extra source to hold the element offset (dword), which is what the
backend instructions need.

The original byte-offset source provided by NIR is not replaced
because on a4xx and a5xx the backend still needs it.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Caio Marcelo de Oliveira Filho 822a8865e4 nir: Add a pass to combine store_derefs to same vector
v2: (all from Jason)
    Reuse existing function for the end of the block combinations.
    Check the SSA values are coming from the right place in tests.
    Document the case when the store to array_deref is reused.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Jason Ekstrand bd17bdc56b glsl/lower_vector_derefs: Don't use a temporary for TCS outputs
Tessellation control shader outputs act as if they have memory backing
them and you can have multiple writes to different components of the
same vector in-flight at the same time.  When this happens, the load vec
store pattern that gets used by ir_triop_vector_insert doesn't yield the
correct results.  Instead, just emit a sequence of conditional
assignments.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2019-03-13 02:10:31 +00:00
Jason Ekstrand 20c4578c55 glsl/list: Add a list variant of insert_after
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-13 02:10:31 +00:00
Jason Ekstrand 83fdefc062 nir/loop_unroll: Fix out-of-bounds access handling
The previous code was completely broken when it came to constructing the
undef values.  I'm not sure how it ever worked.  For the case of a copy
that reads an undefined value, we can just delete the copy because the
destination is a valid undefined value.  This saves us the effort of
trying to construct a value for an arbitrary copy_deref intrinsic.

Fixes: e8a8937a04 "nir: add partial loop unrolling support"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-12 21:06:39 -05:00
Jason Ekstrand 5ef2b8f1f2 nir: Add a pass for lowering IO back to vector when possible
This pass tries to turn scalar and array-of-scalar IO variables into
vector IO variables whenever possible.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
2019-03-12 15:34:06 +00:00
Connor Abbott 5b2ec9c81e nir: Add a stripping pass for improved cacheability
Oftentimes various nir shaders after lowering will be the same, or
almost the same. For example, this can happen when the same shader is
linked with different shaders to form different pipelines and
cross-stage optimizations don't kick in to change it. We want to avoid
running the backend twice on these shaders. We were already doing this
with radeonsi, but we were storing a few extra pieces of information
that made this much less effective compared to TGSI. The worse offender
by far was the program name, which caused most of the cache misses. This
pass strips out these pieces of information, controlled by the NIR_STRIP
debug env variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-12 10:49:48 +01:00
Brian Paul 02c2863df5 nir: silence a couple new compiler warnings
[33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'.
../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’:
../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
    if (*ind == NULL || *ind && (*ind)->type != basic_induction ||
                             ^
[85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'.
../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’:
../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable]
    nir_cf_node *unroll_loc =
                 ^
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-12 14:34:51 +11:00
Timothy Arceri 3235a942c1 nir: find induction/limit vars in iand instructions
This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

On RADV this unrolls a bunch of loops in F1-2017 shaders.

Totals from affected shaders:
SGPRS: 4112 -> 4136 (0.58 %)
VGPRS: 4132 -> 4052 (-1.94 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 515444 -> 587720 (14.02 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 194 -> 196 (1.03 %)
Wait states: 0 -> 0 (0.00 %)

It also unrolls a couple of loops in shader-db on radeonsi.

Totals from affected shaders:
SGPRS: 128 -> 128 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 6880 -> 9504 (38.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 16 -> 16 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri 67c3478482 nir: pass nir_op to calculate_iterations()
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri 11e8f8a166 nir: add get_induction_and_limit_vars() helper to loop analysis
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri f219f6114d nir: add helper to return inversion op of a comparison
This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

So in order to find the trip count we need to find the inverse of
ilt.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri 090feaacdc nir: simplify the loop analysis trip count code a little
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.

The new helper will also be used in a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri 7571de8eaa nir: unroll some loops with a variable limit
For some loops can have a single terminator but the exact trip
count is still unknown. For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

Shader-db results radeonsi (all affected are from Tropico 5):

Totals from affected shaders:
SGPRS: 144 -> 152 (5.56 %)
VGPRS: 124 -> 108 (-12.90 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5180 -> 6640 (28.19 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 17 -> 21 (23.53 %)
Wait states: 0 -> 0 (0.00 %)

Shader-db results i965 (SKL):

total loops in shared programs: 3808 -> 3802 (-0.16%)
loops in affected programs: 6 -> 0
helped: 6
HURT: 0

vkpipeline-db results RADV (Unrolls some Skyrim VR shaders):

Totals from affected shaders:
SGPRS: 304 -> 304 (0.00 %)
VGPRS: 296 -> 292 (-1.35 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15756 -> 25884 (64.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 29 -> 29 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: fix bug where last iteration would get optimised away by
    mistake.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri 68ce0ec222 nir: calculate trip count for more loops
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).

For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri e8a8937a04 nir: add partial loop unrolling support
This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.

The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.

We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.

A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:

GPU duration: 350 -> 325 microseconds

shader-db results radeonsi VEGA (NIR backend):

SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)

shader-db results i965 SKL:

total instructions in shared programs: 13098265 -> 13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21

total cycles in shared programs: 332039949 -> 331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9

vkpipeline-db results VEGA:

Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri fba5d275db nir: add new partially_unrolled bool to nir_loop
In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri 03a452b7d0 nir: add guess trip count support to loop analysis
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.

v2: check if the induction var is used to index more than a single
    array and if so get the size of the smallest array.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Xavier Bouchoux c5236fc6e2 nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'
'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.

Previously, shaders failed with:

SPIR-V parsing FAILED:
    In file ../src/compiler/spirv/spirv_to_nir.c:1412
    !is_shadow
    632 bytes into the SPIR-V binary

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-11 23:28:39 +01:00
Connor Abbott d086d16b81 nir/serialize: Prevent writing uninitialized state_slot data
The nir_state_slot struct had some padding that was never initialized.
Serializing the individual parts of the struct is more robust and avoids
the overhead of zeroing it at creation, so just do that.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-11 15:17:41 +01:00
Kenneth Graunke da51e3f1b0 Revert MR 369 (Fix extract_i8 and extract_u8 for 64-bit integers)
This broke piles of image load store tests (179 failures on CI,
mesa_master build #15546, previous build right before this landed
was green).  I'd rather not leave the tree on fire over the weekend,
so let's revert for now, and we can figure out what happened next week.
2019-03-09 01:42:16 -08:00
Ian Romanick 18e4bf65de nir/algebraic: Add missing 16-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick 55c1ac4b75 nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick 9aaaac6080 nir/algebraic: Remove redundant extract_[iu]8 patterns
No shader-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick 37ee462e03 nir/algebraic: Fix up extract_[iu]8 after loop unrolling
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256837 (<.01%)
instructions in affected programs: 4713 -> 4710 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06%

total cycles in shared programs: 372286583 -> 372286583 (0.00%)
cycles in affected programs: 198516 -> 198516 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01%

No changes on any other Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Alejandro Piñeiro 686b7b1d48 nir/linker: fix ARRAY_SIZE query with xfb varyings
For a non-array varying, it is expecting ARRAY_SIZE as 1, instead of 0.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Antia Puentes de31fb2f4f nir/linker: Fix TRANSFORM_FEEDBACK_BUFFER_INDEX
From the ARB_enhanced_layouts specification:

  "For the property TRANSFORM_FEEDBACK_BUFFER_INDEX, a single integer
   identifying the index of the active transform feedback buffer
   associated with an active variable is written to <params>.  For
   variables corresponding to the special names "gl_NextBuffer",
   "gl_SkipComponents1", "gl_SkipComponents2", "gl_SkipComponents3",
   and "gl_SkipComponents4", -1 is written to <params>."

We were storing the xfb_buffer value, instead of the value
corresponding to GL_TRANSFORM_FEEDBACK_BUFFER_INDEX.

Note that the implementation assumes that varyings would be sorted by
offset and buffer.

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro 7c0f411c27 nir/linker: use nir_gather_xfb_info
Instead of a custom ARB_gl_spirv xfb gather info pass.

In fact, this is not only about reusing code, but the current custom
code was not handling properly how many varyings are enumerated from
some complex types. So this change is also about fixing some corner
cases.

v2: Use util_bitcount, simplify current stage check (Kenneth)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro b2a212ac2e nir/xfb: handle arrays and AoA of basic types
On OpenGL, a array of a simple type adds just one varying. So
gl_transform_feedback_varying_info struct defined at mtypes.h includes
the parameters Type (base_type) and Size (number of elements).

This commit checks this when the recursive add_var_xfb_outputs call
handles arrays, to ensure that just one is addded.

We also need to take into account AoA here

v2: use glsl_type_is_leaf from nir_types (Timothy Arceri)

v3: simplified aoa check, without the need ot using glsl_type_is_leaf,
    using glsl_types_is_struct (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro 2b65fecd85 nir_types: add glsl_type_is_struct helper
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro 8d693746e9 nir/xfb: sort varyings too
Right now we are only re-sorting outputs. But it is better to sort too
varyings, as linker expect them to be sorted out (as it was done on
GLSL). For varyings, and to make easier to compute buffer_index, we
sort also by buffer. We could do the same for outputs, but we lack a
reason for that, so we left it as it is (just offset).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro cf0b2ad486 nir/xfb: adding varyings on nir_xfb_info and gather_info
In order to be used for OpenGL (right now for ARB_gl_spirv).

This commit adds two new structures:

  * nir_xfb_varying_info: that identifies each individual varying. For
    each one, we need to know the type, buffer and xfb_offset

  * nir_xfb_buffer_info: as now for each buffer, in addition to the
    stride, we need to know how many varyings are assigned to it.

For this patch, the only case where num_outputs != num_varyings is
with the case of doubles, that for dvec3/4 could require more than one
output. There are more cases though (like aoa), that will be handled
on following patches.

v2: updated after new nir general XFB support introduced for "anv: Add
    support for VK_EXT_transform_feedback"

v3: compute num_varyings beforehand for allocating, instead of relying
    on num_outputs as approximate value (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro 9f68b9ac71 nir_types: add glsl_varying_count helper
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro b62a8149ab nir/xfb: add component_offset at nir_xfb_info
Where component_offset here is the offset when accessing components of
a packed variable. Or in other words, location_frac on
nir.h. Different places of mesa use different names for it.

Technically nir_xfb_info consumer can get the same from the
component_mask, it seems somewhat forced to make it to compute it,
instead of providing it.

v2: rename local location_frac for comp_offset, more similar to the
intended use (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Jason Ekstrand 1664de5924 nir/builder: Add a build_deref_array_imm helper
Unlike most of the cases in which we do this by hand, the new helper
properly handles non-32-bit pointers.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 21:20:30 +00:00
Jason Ekstrand fcf2a0122e nir/builder: Cast array indices in build_deref_follower
There's no guarantee when build_deref_follower is called that the two
derefs have the same bit size destination.  Insert a cast on the array
index in case we have differing bit sizes.  While we're here, insert
some asserts in build_deref_array and build_deref_ptr_as_array.  The
validator will catch violations here but they're easier to debug if we
catch them while building.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 21:20:30 +00:00
Jason Ekstrand cd4c1458ba nir/builder: Emit better code for iadd/imul_imm
Because we already know the immediate right-hand parameter, we can
potentially save the optimizer a bit of work.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-07 21:20:30 +00:00
Tapani Pälli 8b010f3557 nir: free dead_ctx in case of no progress
Fixes a leak:

  ==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26
  ==7576==    at 0x4C2EE3B: malloc (vg_replace_malloc.c:309)
  ==7576==    by 0x53EF0E4: ralloc_size (ralloc.c:119)
  ==7576==    by 0x53EF0C2: ralloc_context (ralloc.c:113)
  ==7576==    by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176)
  ==7576==    by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-03-07 07:40:19 +02:00
Timothy Arceri 7e60d5a501 glsl: use NIR function inlining for drivers that use glsl_to_nir()
glsl_to_nir() is still missing support for converting certain
functions to NIR, so for those we use the GLSL IR optimisations
to remove the functions.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-06 23:05:20 +00:00
Timothy Arceri 7530d4abfc glsl/freedreno/panfrost: pass gl_context to the standalone compiler
This allows us to use the ctx with glsl_to_nir() in a following
patch.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-06 23:05:20 +00:00
Jason Ekstrand e02959f442 nir/lower_doubles: Inline functions directly in lower_doubles
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves.  This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand f25ca337b4 nir/deref: Expose nir_opt_deref_impl
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand de8d80f9cc nir/inline_functions: Break inlining into a builder helper
This pulls the guts of function inlining into a builder helper so that
it can be used elsewhere.  The rest of the infrastructure is still
needed for most inlining cases to ensure that everything gets inlined
and only ever once.  However, there are use-cases where you just want to
inline one little thing.  This new helper also has a neat trick where it
can seamlessly inline a function from one nir_shader into another.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand 0a6b1d0580 glsl/nir: Inline functions in float64_funcs_to_nir
This doesn't really change anything as the functions will all get
inlined anyway.  However it does let us do a bit of the work earlier and
in a common place.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand 82d9a37a59 glsl/nir: Add a shared helper for building float64 shaders
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand 9314084237 nir: Teach loop unrolling about 64-bit instruction lowering
The lowering we do for 64-bit instructions can cause a single NIR ALU
instruction to blow up into hundreds or thousands of instructions
potentially with control flow.  If loop unrolling isn't aware of this,
it can unroll a loop 20 times which contains a nir_op_fsqrt which we
then lower to a full software implementation based on integer math.
Those 20 invocations suddenly get a lot more expensive than NIR loop
unrolling currently expects.  By giving it an approximate estimate
function, we can prevent loop unrolling from going to town when it
shouldn't.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand ebb3695376 nir: Expose double and int64 op_to_options_mask helpers
We already have one internally for int64 but we don't have a similar one
for doubles so we'll have to make one.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Iago Toral Quiroga ca2b5e9069 compiler/nir: add an is_conversion field to nir_op_info
This is set to True only for numeric conversion opcodes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Timothy Arceri 54522d0506 nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc()
Replace done using:
find ./src -type f -exec sed -i -- \
's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \;

Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Timothy Arceri e16a27fcf8 glsl: rename record_types -> struct_types
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Timothy Arceri 8294295dbd glsl: rename record_location_offset() -> struct_location_offset()
Replace done using:
find ./src -type f -exec sed -i -- \
's/record_location_offset(/struct_location_offset(/g' {} \;

Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Timothy Arceri 88d8c4e290 glsl: rename get_record_instance() -> get_struct_instance()
Replace done using:
find ./src -type f -exec sed -i -- \
's/get_record_instance(/get_struct_instance(/g' {} \;

Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Timothy Arceri 81ee2cd8ba glsl: rename is_record() -> is_struct()
Replace was done using:
find ./src -type f -exec sed -i -- \
's/is_record(/is_struct(/g' {} \;

Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Karol Herbst 272e927d0e nir/spirv: initial handling of OpenCL.std extension opcodes
Not complete, mostly just adding things as I encounter them in CTS. But
not getting far enough yet to hit most of the OpenCL.std instructions.

Anyway, this is better than nothing and covers the most common builtins.

v2: add hadd proof from Jason
    move some of the lowering into opt_algebraic and create new nir opcodes
    simplify nextafter lowering
    fix normalize lowering for inf
    rework upsample to use nir_pack_bits
    add missing files to build systems
v3: split lines of iadd/sub_sat expressions

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 22:28:29 +01:00
Karol Herbst d0b47ec4df nir/vtn: add support for SpvBuiltInGlobalLinearId
v2: use formula with fewer operations

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-05 22:28:29 +01:00
Karol Herbst f48c672965 nir: add support for address bit sized system values
v2: add assert in else clause
    make local group intrinsics 32 bit wide
v3: always use 32 bit constant for local_size
v4: add comment by Jason

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 22:28:29 +01:00
Karol Herbst 5f8257fb0b nir/spirv: improve parsing of the memory model
v2: add some vtn_fail_ifs

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 22:28:29 +01:00
Karol Herbst 5d48359a2c nir: replace magic numbers with M_PI
we define it inside 'include/c99_math.h' so it is safe to use.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 22:28:29 +01:00
Timur Kristóf 6684e039eb nir: Add multiplier argument to nir_lower_uniforms_to_ubo.
Note that locations can be set in different units, and the multiplier
argument caters to supporting these different units. For example,
st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4,
while tgsi_to_nir uses bytes, so the multiplier should be 16.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-05 19:13:27 +00:00
Timur Kristóf 909d1f50f3 nir: Move nir_lower_uniforms_to_ubo to compiler/nir.
The nir_lower_uniforms_to_ubo function is useful outside of
mesa/state_tracker, and in fact is needed to produce NIR for
drivers that have the PIPE_CAP_PACKED_UNIFORMS capability.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-05 19:13:27 +00:00
Timur Kristóf 317f10bf40 nir: Add ability for shaders to use window space coordinates.
This patch adds a shader_info field that tells the driver to use window
space coordinates for a given vertex shader. It also enables this feature
in radeonsi (the only NIR-capable driver that supported it in TGSI),
and makes tgsi_to_nir aware of it.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-05 19:13:27 +00:00
Eric Anholt 2780a99ff8 v3d: Move the stores for fixed function VS output reads into NIR.
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them.  Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.

total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)

Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)
2019-03-05 10:59:40 -08:00
Eric Anholt a4f612b4cf nir: Improve printing of load_input/store_output variable names.
We were printing only when the channel was exactly the start channel, so
scalarized loads/stores would be missing the name on the rest.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-05 10:59:40 -08:00
Jason Ekstrand 61e009d2c4 spirv: Use the same types for resource indices as pointers
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver.  This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 9f7ee4f8e5 spirv: Use the generic dereference function for OpArrayLength
With the new deref changes, the old pointer_offset version may not be
the right one to call.  Just call the generic one and let it sort it
out.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand f1dbc7e97d spirv: Pull offset/stride from the pointer for OpArrayLength
We can't pull it from the variable type because it might be an array of
blocks and not just the one block.  While we're here, throw in some
error checking.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2019-03-05 10:06:50 -06:00
Jason Ekstrand 5c96120b5c intel,nir: Lower TXD with min_lod when the sampler index is not < 16
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-04 23:56:39 +00:00
Jason Ekstrand ca295ddbfb spirv: OpImageQueryLod requires a sampler
No idea how this fell through the cracks besides the fact that the
sampler bound at 0 almost always works and the CTS isn't amazing.  In
any case, this appears to have been broken for almost forever.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2019-03-04 23:56:39 +00:00
Sagar Ghuge 58bcebd987 spirv: Allow [i/u]mulExtended to use new nir opcode
Use new nir opcode nir_[i/u]mul_2x32_64 and extract lower and higher 32
bits as needed instead of emitting mul and mul_high.

v2: Surround the switch case with curly braces (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Sagar Ghuge 47ec9bdc60 nir/algebraic: Optimize low 32 bit extraction
Optimize a situation where we only need lower 32 bits from 64 bit
result.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Sagar Ghuge 1d8994a63b glsl: [u/i]mulExtended optimization for GLSL
Optimize mulExtended to use 32x32->64 multiplication.

Drivers which are not based on NIR, they can set the
MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old
behavior.

v2: Add missing condition check (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <Matt Turner <mattst88@gmail.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Sagar Ghuge e551040c60 nir/glsl: Add another way of doing lower_imul64 for gen8+
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.

Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended

v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
    2) Add lower_mul_2x32_64 flag (Matt Turner)
    3) Remove associative property as bit size is different (Connor
       Abbott)

v3: Fix indentation and variable naming convention (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Ilia Mirkin 4eec3a2a36 glsl: fix recording of variables for XFB in TCS shaders
This is purely for conformance, since it's not actually possible to do
XFB on TCS output varyings. However we do have to make sure we record
the names correctly, and this removes an extra level of array-ness from
the names in question.

Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage

v2: Add comment to the new program_resource_visitor::process function.
    (Ilia Mirkin)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo bf1f49482d glsl: TCS outputs can not be transform feedback candidates on GLES
Avoids regression on:

KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage

that is uncovered by the following patch.

"glsl: fix recording of variables for XFB in TCS shaders"

v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders
v3: Move this patch before "glsl: fix recording of variables for XFB in TCS
    shaders" to avoid temporal regressions. (Illia Mirkin)

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo cc7173b438 glsl: fix typos in comments "transfor" -> "transform"
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Gert Wollny 3214f20914 mesa: Expose EXT_texture_query_lod and add support for its use shaders
EXT_texture_query_lod provides the same functionality for GLES like
the ARB extension with the same name for GL.

v2: Set ES 3.0 as minimum GLES version as required by the extension

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-03 21:50:42 +01:00
Jordan Justen 7de056e1a9
scons: Generate float64_glsl.h for glsl_to_nir fp64 lowering
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Jordan Justen 31b35916dd
nir: Add int64/doubles options into nir_shader_compiler_options
This will allow the options to be visible under nir_shader->options,
which will allow the gallium state_tracker to use the driver preferred
settings during glsl_to_nir.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:41 -08:00
Ian Romanick bae0c36751 nir/algebraic: Optimize away an fsat of a b2f
The b2f can only produce 0.0 or 1.0, so the fsat does nothing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-02 13:58:56 -08:00
Ian Romanick ecc9ffa778 nir/algebraic: Replace a-fract(a) with floor(a)
I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.

In review, Tim Arceri asked:
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?

As far as I can tell, it's just because our scheduler is terrible.  In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).

Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader.  Once that happens, everything
is different.

I looked at one vertex shader that was hurt (from Goat Simulator).  In
that case, both the floor and fract are used.  The optimization
eliminates the add, and it should allow better scheduling.  In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.

In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions.  It also did a lot more harm
than good on SKL (helped 82 vs. hurt 241).

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.

total cycles in shared programs: 383007378 -> 382997063 (<.01%)
cycles in affected programs: 1650825 -> 1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs)   min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel)   min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 128139812 -> 128135762 (<.01%)
cycles in affected programs: 3829724 -> 3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.

Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-03-01 12:43:25 -08:00
Ian Romanick d40640efe8 nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b))
I have not investigated the result of doing this during code
generation.  That should be possible, but it would be a bit more
effort.

All Gen6+ platforms had nearly identical results. (Skylake shown)
total cycles in shared programs: 370961508 -> 370961367 (<.01%)
cycles in affected programs: 5174 -> 5033 (-2.73%)
helped: 2
HURT: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8206587 -> 8206589 (<.01%)
instructions in affected programs: 1325 -> 1327 (0.15%)
helped: 0
HURT: 2

total cycles in shared programs: 187657422 -> 187657428 (<.01%)
cycles in affected programs: 11566 -> 11572 (0.05%)
helped: 0
HURT: 2

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "intel/fs: Generate if instructions with
inverted conditions") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071804 -> 15071806 (<.01%)
instructions in affected programs: 640 -> 642 (0.31%)
helped: 0
HURT: 2

total cycles in shared programs: 369914348 -> 369916569 (<.01%)
cycles in affected programs: 27900 -> 30121 (7.96%)
helped: 4
HURT: 15
helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3
helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40%
HURT stats (abs)   min: 2 max: 758 x̄: 156.07 x̃: 81
HURT stats (rel)   min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91%
95% mean confidence interval for cycles value: 12.68 221.11
95% mean confidence interval for cycles %-change: 3.09% 21.23%
Cycles are HURT.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick eae19f5f19 nir/algebraic: Replace i2b used by bcsel or if-statement with comparison
All of the helped shaders are in Deus Ex.  I looked at a couple shaders,
and they have a pattern like:

    vec1 32 ssa_373 = i2b32 ssa_345.w
    vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0
    ...
    vec1 32 ssa_377 = ine ssa_345.w, ssa_0
    if ssa_377 {
        ...
        vec1 32 ssa_416 = i2b32 ssa_385.w
        vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374
        ...
    }

The massive help occurs because the i2b32 is removed, then other passes
determine that ssa_374 must be ssa_20 inside the if-statement allowing
the first bcsel to also be deleted.

v2: Rebase on 1-bit Boolean changes.

v3: Fix i2b32 vs ine problem in if-statement replacement.  Noticed by
Bas.

Skylake
total instructions in shared programs: 15241394 -> 15186287 (-0.36%)
instructions in affected programs: 890583 -> 835476 (-6.19%)
helped: 355
HURT: 0
helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149
helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59%
95% mean confidence interval for instructions value: -165.07 -145.39
95% mean confidence interval for instructions %-change: -6.42% -5.77%
Instructions are helped.

total cycles in shared programs: 373846583 -> 371023357 (-0.76%)
cycles in affected programs: 118972102 -> 116148876 (-2.37%)
helped: 343
HURT: 14
helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089
helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77%
HURT stats (abs)   min: 120 max: 4126 x̄: 2482.79 x̃: 3019
HURT stats (rel)   min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11%
95% mean confidence interval for cycles value: -8723.28 -7093.12
95% mean confidence interval for cycles %-change: -2.57% -2.02%
Cycles are helped.

total spills in shared programs: 32401 -> 23465 (-27.58%)
spills in affected programs: 24457 -> 15521 (-36.54%)
helped: 343
HURT: 0

total fills in shared programs: 37866 -> 31765 (-16.11%)
fills in affected programs: 18889 -> 12788 (-32.30%)
helped: 343
HURT: 0

Broadwell and Haswell had similar results. (Haswell shown)
Haswell
total instructions in shared programs: 13764783 -> 13750679 (-0.10%)
instructions in affected programs: 1176256 -> 1162152 (-1.20%)
helped: 334
HURT: 21
helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47
helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37%
HURT stats (abs)   min: 1 max: 61 x̄: 5.76 x̃: 1
HURT stats (rel)   min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03%
95% mean confidence interval for instructions value: -43.99 -35.47
95% mean confidence interval for instructions %-change: -1.35% -1.08%
Instructions are helped.

total cycles in shared programs: 386511910 -> 385402528 (-0.29%)
cycles in affected programs: 143831110 -> 142721728 (-0.77%)
helped: 327
HURT: 39
helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570
helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96%
HURT stats (abs)   min: 16 max: 4881 x̄: 1065.95 x̃: 997
HURT stats (rel)   min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24%
95% mean confidence interval for cycles value: -3375.59 -2686.60
95% mean confidence interval for cycles %-change: -0.92% -0.64%
Cycles are helped.

total spills in shared programs: 100480 -> 97846 (-2.62%)
spills in affected programs: 84702 -> 82068 (-3.11%)
helped: 316
HURT: 21

total fills in shared programs: 96877 -> 94369 (-2.59%)
fills in affected programs: 69167 -> 66659 (-3.63%)
helped: 316
HURT: 9

No changes on Ivy Bridge or earlier platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Caio Marcelo de Oliveira Filho 1458aa1f78 nir/copy_prop_vars: handle indirect vector elements
Differently than the direct case, the indirect array derefs of vector
are handled like regular derefs, with the exception that we ignore any
vector entry that has SSA values when performing a load.  Such SSA
values don't help loading of the indirect unless we emit an if-ladder.

Copy_derefs are supported for indirects.

Also enable two tests that now pass.

v2: Remove unnecessary temporaries.  Be clearer when identifying the
    case where copy_entry doesn't help when we are dealing with an
    indirect array_deref (of a vector).  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho 6c0de78cc2 nir/copy_prop_vars: prefer using entries from equal derefs
When looking up an entry to use, always prefer an equal match, as it
more likely to contain reusable SSA or derefs to propagate.

This will be necessary when adding entries with array derefs of
vectors, because we don't want the vector if the equal entry (an array
deref of that vector) is present.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho 61965afd00 nir/copy_prop_vars: add tests for indirect array deref
Both on an actual array and on a vector, and an extra test on a vector
mixing direct and indirect access.  The vector tests are disabled and
will be enabled by a later commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho 96c32d7776 nir/copy_prop_vars: handle load/store of vector elements
When direct array deref is used on a vector type (for loads and
stores), copy_prop_vars is now smart to propagate values it knows
about.

Given a 'vec4 v', storing to v[3] will update the copy entry for v and
it is equivalent to a write to v.w.  Loading from v[1] will try first
to see if there's a known value for v.y -- and drop the load in that
case.

The copy entries still always refer to the entire vectors, so the
operations happen on the parent deref (the 'vector') and the values
are fixed accordingly.

It might be the case now that certain entries have not only different
SSA defs in each element but also those come from different components
than they are set to, because stores to individual elements always
come from a SSA definition with a single component.

Tests related to these cases are now enabled.

v2: Instead of asserting on invalid indices, "load" an undef and
    remove the store.  (Jason)

v3: Merge code path for the cases of is_array_deref_of_vector into the
    regular code path.  Add a base_index parameter to
    value_set_from_value.  (code changes by Jason)

v4: Removed the get_entry_for_deref helper, now being used only once.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho 33dafdc024 nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS
Also replace uses of 0xf with the appropriate full mask created from
the number of components.

Note that an increase of MAX might make us change how the data is
stored later on, but for now at least we make sure the pass is not
hardcoded.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho e84c841fb0 nir/copy_prop_vars: rename/refactor store_to_entry helper
The name reflected this function role back when the pass also did dead
write elimination.  So rename it to what it does now, which is setting
a value using another value; and narrow the argument list.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Juan A. Suarez Romero b43b55d461 nir/spirv: return after emitting a branch in block
When emitting a branch in a block, it does not make sense to continue
processing further instructions, as they will not be reachable.

This fixes a nasty case with a loop with a branch that both then-part
and else-part exits the loop:

%1 = OpLabel
     OpLoopMerge %2 %3 None
     OpBranchConditional %false %2 %2
%3 = OpLabel
     OpBranch %1
%2 = OpLabel
    [...]

We know that block %1 will branch always to block %2, which is the merge
block for the loop. And thus a break is emitted. If we keep continuing
processing further instructions, we will be processing the branch
conditional and thus emitting the proper NIR conditional, which leads to
instructions after the break.

This fixes dEQP-VK.graphicsfuzz.continue-and-merge.

CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 09:47:06 +01:00
Timothy Arceri 7536af670b glsl: fix shader cache for packed param list
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.

Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.

This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.

Fixes: edded12376 ("mesa: rework ParameterList to allow packing")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-28 11:47:37 +11:00