Commit Graph

2145 Commits

Author SHA1 Message Date
Mark Janes dc8df485e9 intel/compiler: reorder shader cache keys to minimize padding
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17749>
2022-07-29 20:45:25 +00:00
Mark Janes a4a4aefa03 intel/compiler: pad all data structures used by shader cache keys
When the compiler pads a data structure, the padded bytes will not be
initialized.  Shader keys are compared with memcmp and unitialized
bytes within the structure breaks this mechanism.

Explicitly pad the structures with members, so the compiler is forced
to initialize them.  Add a warning to indicate if a change to
alignment in any of the data structures requires additional padding.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17749>
2022-07-29 20:45:25 +00:00
Kenneth Graunke fc02ce5713 intel/eu: Mark header present in URB memory fences on XeHP
Fixes the following EU validation error:

   ERROR: Header must be present for all URB messages.

The message header is ignored for URB fence messages, so I doubt that
this actually matters in practice.  But we should probably mark it as
present, because you have to send something, and according to the
documentation, there is a message header, it's just ignored.

Fixes: e6a9501aa2 ("intel/fs: Add the URB fence message")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
2022-07-28 21:31:45 +00:00
Kenneth Graunke 986b49a56d intel/eu: Clarify spec citations for XeHP region restrictions
When this rule started causing issues, I looked it up in the
documentation, and found the rule for 64-bit destinations and
integer DWord multiplication, but there was no mention of floating
point destinations, as the text in brackets suggested.  The actual
restriction text had been updated, so this led to some confusion
where I thought the conditions had been changed in newer docs.

However, what's actually going on is that there are two separate
conditions, each listed in separate rows of the table.  One lists
64-bit destinations or integer DWord multiplication, and the other
mentions floating-point destinations.  In both cases, the actual
restrictions are identical, so we handle them together in the code.

Try to update the comment to avoid future confusion.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
2022-07-28 21:31:45 +00:00
Kenneth Graunke 5c88488a64 intel/eu: Fix XeHP register region validation for hstride == 0
Recently, we started using <1;1,0> register regions for consecutive
channels, rather than the <8;8,1> we've traditionally used, as the
<1;1,0> encoding can be compacted on XeHP.  Since then, one of the
EU validator rules has been flagging tons of instructions as errors:

   mov(16)   g114<1>F   g112<1,1,0>UD   { align1 1H I@2 compacted };
   ERROR: Register Regioning patterns where register data bit locations are changed between source and destination are not supported except for broadcast of a scalar.

Our code for this restriction checked three things:

   #1: vstride != width * hstride ||
   #2: src_stride != dst_stride ||
   #3: subreg != dst_subreg

Destination regions are always linear (no replicated values, nor
any overlapping components), as they only have hstride.  Rule #1 is
requiring that the source region be linear as well.  Rules #2-3 are
straightforward: the subregister must match (for the first channel to
line up), and the source/destination strides must match (for any
subsequent channels to line up).

Unfortunately, rules #1-2 weren't working when horizontal stride was 0.
In that case, regions are linear if width == 1, and the stride between
consecutive channels is given by vertical stride instead.

So we adjust our src_stride calculation from

   src_stride = hstride * type_size;

to:

   src_stride = (hstride ? hstride : vstride) * type_size;

and adjust rule #1 to allow hstride == 0 as long as width == 1.

While here, we also update the text of the rule to match the latest
documentation, which apparently clarifies that it's the location of
the LSB of the channel which matters.

Fixes: 3f50dde8b3 ("intel/eu: Teach EU validator about FP/DP pipeline regioning restrictions.")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
2022-07-28 21:31:45 +00:00
Jason Ekstrand 0772242feb intel/eu: Don't throw validation errors on float MOV_INDIRECT
Fixes: 3f50dde8b3 ("intel/eu: Teach EU validator about FP/DP pipeline regioning restrictions.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
2022-07-28 21:31:45 +00:00
Kenneth Graunke 82ee30e558 intel/eu: Handle compaction when inserting validation errors
When the EU validator encountered an error, it would add an annotation
to the disassembly.  Unfortunately, the code to insert an error assumed
that the next instruction would start at (offset + sizeof(brw_inst)),
which is not true if the instruction with an error is compacted.

This could lead to cascading disassembly errors, where we started trying
to decode the next instruction at the wrong offset, and getting lots of
scary looking output:

   ERROR: Register Regioning patterns where [...]
   (-f0.1.any16h) illegal(*** invalid execution size value 6 )      { align1 $7.src atomic };
   (+f0.1.any16h) illegal.sat(*** invalid execution size value 6 )  { align1 $9.src AccWrEnable };
   illegal(*** invalid execution size value 6 )                     { align1 $11.src };
   (+f0.1) illegal.sat(*** invalid execution size value 6 )         { align1 F@2 AccWrEnable };
   (+f0.1) illegal.sat(*** invalid execution size value 6 )         { align1 F@2 AccWrEnable };
   (+f0.1) illegal.sat(*** invalid execution size value 6 )         { align1 $15.src AccWrEnable };
   illegal(*** invalid execution size value 6 )                     { align1 $15.src };
   (+f0.1) illegal.sat.g.f0.1(*** invalid execution size value 6 )  { align1 $13.src AccWrEnable };

Only the first instruction was actually wrong - the rest are just a
result of starting the disassembler at the wrong offset.  Trash ensues!

To fix this, just pass the instruction size in a few layers so we can
record the next offset properly.

Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
2022-07-28 21:31:45 +00:00
Ian Romanick f7f232385f intel/fs: Use canonical form for "work around" tags
Trivial.  Also clean up some weird whitespace.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick 377246318a intel/fs: Eliminate "masked" and "per slot offset" URB messages
All of this information can be inferred from the sources.

v2: Fix "error: unused variable 'opcode'" detected by marge-bot.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick b21b901b46 intel/fs: Don't pass flags to lower_urb_read_logical_send or lower_urb_write_logical_send
...because the flags can be inferred from the sources.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick 1b17f8fc5a intel/fs: Make logical URB read instructions more like other logical instructions
No shader-db changes on any Intel platform

Fossil-db results:

Tiger Lake
Instructions in all programs: 156926440 -> 156926470 (+0.0%)
Instructions hurt: 15

Cycles in all programs: 7513099349 -> 7513099402 (+0.0%)
Cycles hurt: 15

Ice Lake and Skylake had similar results. (Ice Lake shown)
Cycles in all programs: 9099036492 -> 9099036489 (-0.0%)
Cycles helped: 1

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick 349a040f68 intel/fs: Make logical URB write instructions more like other logical instructions
The changes to fs_visitor::validate() helped track down a place where I
initially forgot to convert a message to the new sources layout.  This
had caused a different validation failure in
dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing,
but this were not detected until after SENDs were lowered.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951145 -> 19951133 (<.01%)
instructions in affected programs: 2429 -> 2417 (-0.49%)
helped: 8 / HURT: 0

total cycles in shared programs: 858904152 -> 858862331 (<.01%)
cycles in affected programs: 5702652 -> 5660831 (-0.73%)
helped: 2138 / HURT: 1255

Broadwell
total cycles in shared programs: 904869459 -> 904835501 (<.01%)
cycles in affected programs: 7686744 -> 7652786 (-0.44%)
helped: 2861 / HURT: 2050

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442032 (-0.0%)
Instructions helped: 337

Cycles in all programs: 9099270231 -> 9099036492 (-0.0%)
Cycles helped: 40661
Cycles hurt: 28606

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:18 +00:00
Ian Romanick 95e50d198f intel/vec4: Set lower_usub_sat
Reviewed-by: Emma Anholt <emma@anholt.net>
Closes: #6900
Fixes: 90a8fb03 ("nir/lower_io: Fix array length of buffers larger than INT32_MAX.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17637>
2022-07-22 17:54:28 +00:00
Jason Ekstrand 87ab287436 vulkan: Call lower_clip_cull_distance_arrays in vk_spirv_to_nir
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17644>
2022-07-21 21:18:48 +00:00
Emma Anholt 94bd06256a intel/fs: Simplify brw_barycentric_mode() args.
Reduce a bit of mode lookup noise I was tracing through trying to resolve
the previous bug.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
2022-07-19 01:25:47 +00:00
Lionel Landwerlin 2d1f021e16 intel/fs: Set NonPerspectiveBarycentricEnable when the interpolator needs it.
[anholt: changed to make all drivers do the right thing by moving the
payload barycentric check into the compiler]

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
2022-07-19 01:25:47 +00:00
Jordan Justen 4246a1ff47 intel/compiler: Don't create vec4 reg-set for gen8+
After 60e1d0f028, we know that vec4 will never be used for gen >= 8.

Ref: 60e1d0f028 ("intel/compiler: Remove INTEL_SCALAR_... env variables")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17437>
2022-07-14 17:49:01 +00:00
Jason Ekstrand 3346f6918f intel/fs,anv: Rework handling of coarse and sample shading
Now that this information is accurately gathered by spirv_to_nir, we no
longer need the hack.  We just need to fix up the way we handle some of
the key bits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand d0b154319d intel/fs: Simplify persample_dispatch
Thanks to the previous commit, we no longer need this check.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand fd17aaf430 intel/fs: Use nir_lower_single_sampled
This lets us drop demote_sample_qualifiers as well as a back-end check
for key->multisample_fbo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand ca9f0f72db intel/fs: Use shader_info::fs::uses_sample_shading
NIR constructs this information for us as part of nir_gather_info these
days so we can simplify our logic a bit.  This will also let us be more
correct once we move uses_sample_shading scraping earlier.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Marcin Ślusarz 585d81e3ec intel/compiler: print shaders after nir_remove_unused_varyings
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17516>
2022-07-13 15:50:02 +00:00
Kenneth Graunke c08f51d0cf intel/compiler: Don't set SBID on EOT send messages
The send message with EOT is the last instruction so there's no need to
set any SWSB annotations for future instructions.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17202>
2022-07-09 05:26:25 +00:00
Jason Ekstrand 530de844ef intel,anv,iris,crocus: Drop subgroup size from the shader key
Use nir->info.subgroup_size instead.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17337>
2022-07-08 22:47:22 +00:00
Jason Ekstrand e1ee201722 shader_info: Move subgroup_size out of cs and make it an enum
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17337>
2022-07-08 22:47:22 +00:00
Ian Romanick bbcb881f46 intel/fs: Remove non-_LOGICAL URB messages
The _LOGICAL versions are lowered direct to SEND, so nothing can ever
generate these messages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick bdc7668008 intel/fs: Lower URB messages to SEND
Before rebasing on top of Ken's split-SEND optimization (see !17018),
this commit just caused some scheduling changes in various tessellation
and geometry shaders.  These changes were caused by the addition of real
latency information for the URB messages.

With the addition of the split-SEND optimization, the changes
are... staggering.  All of the shaders helped for spills and fills are
vertex shaders from Batman Arkham Origins.  What surprises me is that
these shaders account for such a high percentage of the spills and fills
in fossil-db.  85%?!?

v2: Use FIXED_GRF instead of BRW_GENERAL_REGISTER_FILE in an assertion.
Suggested by Ken.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20013625 -> 19954020 (-0.30%)
instructions in affected programs: 4007157 -> 3947552 (-1.49%)
helped: 31161
HURT: 0
helped stats (abs) min: 1 max: 400 x̄: 1.91 x̃: 2
helped stats (rel) min: 0.08% max: 59.70% x̄: 2.20% x̃: 1.83%
95% mean confidence interval for instructions value: -1.97 -1.86
95% mean confidence interval for instructions %-change: -2.22% -2.18%
Instructions are helped.

total cycles in shared programs: 859337569 -> 858636788 (-0.08%)
cycles in affected programs: 74168298 -> 73467517 (-0.94%)
helped: 13812
HURT: 16846
helped stats (abs) min: 1 max: 291078 x̄: 82.83 x̃: 4
helped stats (rel) min: <.01% max: 37.09% x̄: 3.47% x̃: 2.02%
HURT stats (abs)   min: 1 max: 1543 x̄: 26.31 x̃: 14
HURT stats (rel)   min: <.01% max: 77.97% x̄: 4.11% x̃: 2.58%
95% mean confidence interval for cycles value: -55.10 9.39
95% mean confidence interval for cycles %-change: 0.62% 0.77%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total cycles in shared programs: 904844939 -> 904832320 (<.01%)
cycles in affected programs: 525360 -> 512741 (-2.40%)
helped: 215
HURT: 4
helped stats (abs) min: 4 max: 1018 x̄: 60.16 x̃: 39
helped stats (rel) min: 0.14% max: 15.85% x̄: 2.16% x̃: 2.04%
HURT stats (abs)   min: 79 max: 79 x̄: 79.00 x̃: 79
HURT stats (rel)   min: 1.31% max: 1.57% x̄: 1.43% x̃: 1.43%
95% mean confidence interval for cycles value: -75.02 -40.22
95% mean confidence interval for cycles %-change: -2.37% -1.81%
Cycles are helped.

No shader-db changes on any older Intel platforms.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 142622800 -> 141461114 (-0.8%)
Instructions helped: 197186

Cycles in all programs: 9101223846 -> 9099440025 (-0.0%)
Cycles helped: 37963
Cycles hurt: 151233

Spills in all programs: 98829 -> 13695 (-86.1%)
Spills helped: 2159

Fills in all programs: 128142 -> 18400 (-85.6%)
Fills helped: 2159

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick a477587b4a intel/fs: Add _LOGICAL versions of URB messages
The lowering is currently fake.  It just changes the opcode from the
_LOGICAL version to the non-_LOGICAL version.

v2: Remove some rebase cruft.  's/gfx8_//;s/simd8_/' in
brw_instruction_name.  Both suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick 07b9bfacc7 intel/compiler: Move logical-send lowering to a separate file
brw_fs.cpp was 10kloc.  Now it's only 7.5kloc.  Ugh.

v2: Rebase on 9680e0e4a2.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick c751ca769f intel/eu: Validate some aspects of URB messages
If these checks had been in place previously, some bugs
that... eh-hem... practically took down the Intel CI would have been
caught earlier. *blush*

v2: Update to account for split sends.

v3: Add some more Gfx version checks.  Remove the redundant "src0 is a
GRF" check.  Both suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick b909ac350f intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix
An argument could be made that all stage-specific opcodes for vec4
stages should be prefixed with VEC4_ like the stage-agnostic opcodes.
I'll leave those additional sed jobs for another day.

    egrep -lr '(VS|GS|TCS)_OPCODE_URB_WRITE' src |\
    while read f; do
        sed --in-place 's/\(VS\|GS\|TCS\)_OPCODE_URB_WRITE/VEC4_\1_OPCODE_URB_WRITE/g' $f
    done

    egrep -lr 'T.S_OPCODE[_A-Z]*URB_OFFSETS' src |\
    while read f; do
        sed --in-place 's/\(T.S_OPCODE[_A-Z]*URB_OFFSETS\)/VEC4_\1/g' $f
    done

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Lionel Landwerlin 9680e0e4a2 intel/fs: ray query fix for global address
With stages dispatching with a mask, we can run into situations where
we don't have the global address in all lanes. The existing code
always assumed we had the addres in at least lane0.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bb40e999d1 ("intel/nir: use a single intel intrinsic to deal with ray traversal")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17330>
2022-07-08 00:36:04 +00:00
Kenneth Graunke ec2e8bc33f intel/compiler: Avoid copy propagating large registers into EOT messages
EOT messages need to use g112-g127 for their sources.  With the new
opt_split_sends pass, we may be constructing an EOT message from two
different registers, and be able to copy propagate the original values
into those SENDs.

This can cause problems if we copy propagate from a large register
(say an RGBA value which is 4 GRFs in SIMD8 or 8 GRFs in SIMD16), in a
situation where the SEND only read a subset of that (say the alpha value
out of an RGBA texturing result).  g112-127 can only hold 16 registers
worth of data, and sometimes we can only use g112-126.  So, we can't
propagate if the GRFs in question are larger than 15 GRFs.

Fixes a shader validation failure in Alan Wake.  Thanks to Ian Romanick
for catching this!

shader-db on Icelake shows that only SIMD32 programs are affected, and
the results are pretty negligable:

   total instructions in shared programs: 19615228 -> 19615269 (<.01%)
   instructions in affected programs: 10702 -> 10743 (0.38%)
   helped: 1 / HURT: 43 / largest change: +/- 2 instructions

   total cycles in shared programs: 852001706 -> 852001566 (<.01%)
   cycles in affected programs: 767098 -> 766958 (-0.02%)
   helped: 68 / HURT: 64 / largest change: +/- 774 cycles

   GAINED: 2 / LOST: 0

Fixes: 589b03d02f ("intel/fs: Opportunistically split SEND message payloads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6803
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17390>
2022-07-07 20:20:01 +00:00
Lionel Landwerlin f1dd487531 intel/nir: temporarly disable opt_uniform_atomics for RT/CL
Not had time to investiguate what is going is on but it's definitely a
contributor to failures.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16104>
2022-07-07 10:21:48 +00:00
Lionel Landwerlin 1b6c74c48d intel/fs: make sure memory writes have landed for thread dispatch
The thread dispatch SEND instructions will dispatch new threads
immediately even before the caller of the SEND instruction has reached
EOT. So we really need to make sure all the memory writes are visible
to other threads within the DSS before the SEND.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15755>
2022-07-07 09:48:20 +03:00
Kenneth Graunke 589b03d02f intel/fs: Opportunistically split SEND message payloads
While we've taken advantage of split-sends in select situations, there
are many other cases (such as sampler messages, framebuffer writes, and
URB writes) that have never received that treatment, and continued to
use monolithic send payloads.

This commit introduces a new optimization pass which detects SEND
messages with a single payload, finds an adjacent LOAD_PAYLOAD that
produces that payload, splits it two, and updates the SEND to use both
of the new smaller payloads.

In places where we manually used split SENDS, we rely on underlying
knowledge of the message to determine a natural split point.  For
example, header and data, or address and value.

In this pass, we instead infer a natural split point by looking at the
source registers.  Often times, consecutive LOAD_PAYLOAD sources may
already be grouped together in a contiguous block, such as a texture
coordinate.  Then, there is another bit of data, such as a LOD, that
may come from elsewhere.  We look for the point where the source list
switches VGRFs, and split it there.  (If there is a message header, we
choose to split there, as it will naturally come from elsewhere.)

This not only reduces the payload sizes, alleviating register pressure,
but it means that we may be able to eliminate some payload construction
altogether, if we have a contiguous block already and some extra data
being tacked on to one side or the other.

shader-db results for Icelake are:

   total instructions in shared programs: 19602513 -> 19369255 (-1.19%)
   instructions in affected programs: 6085404 -> 5852146 (-3.83%)
   helped: 23650 / HURT: 15
   helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3
   helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15%
   HURT stats (abs)   min: 1 max: 44 x̄: 7.20 x̃: 2
   HURT stats (rel)   min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00%
   95% mean confidence interval for instructions value: -10.16 -9.55
   95% mean confidence interval for instructions %-change: -3.84% -3.72%
   Instructions are helped.

   total cycles in shared programs: 848180368 -> 842208063 (-0.70%)
   cycles in affected programs: 599931746 -> 593959441 (-1.00%)
   helped: 22114 / HURT: 13053
   helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22
   helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75%
   HURT stats (abs)   min: 1 max: 94022 x̄: 526.67 x̃: 22
   HURT stats (rel)   min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61%
   95% mean confidence interval for cycles value: -222.87 -116.79
   95% mean confidence interval for cycles %-change: -1.44% -1.20%
   Cycles are helped.

   total spills in shared programs: 8387 -> 6569 (-21.68%)
   spills in affected programs: 5110 -> 3292 (-35.58%)
   helped: 359 / HURT: 3

   total fills in shared programs: 11833 -> 8218 (-30.55%)
   fills in affected programs: 8635 -> 5020 (-41.86%)
   helped: 358 / HURT: 3

   LOST:   1 SIMD16 shader, 659 SIMD32 shaders
   GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders

   Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%)

Examining these results: the few shaders where spills/fills increased
were already spilling significantly, and were only slightly hurt.  The
applications affected were also helped in countless other shaders, and
other shaders stopped spilling altogether or had 50% reductions.  Many
SIMD16 shaders were gained, and overall we gain more SIMD32, though many
close to the register pressure line go back and forth.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
2022-07-01 02:05:45 +00:00
Kenneth Graunke a8b93e628a intel/compiler: Handle split-sends in EOT high-register pinning case
SEND messages with EOT need to use g112-g127 for their sources so that
the hardware is able to launch new threads while old ones are finishing
without worrying about register overlap when pushing payloads.  For the
newer split-send messages, this applies to both source registers.

Our special case for this in the register allocator was only considering
the first source.  This wasn't a problem because we hadn't ever tried to
use split-sends with EOT before.  However, my new optimization pass is
going to introduce some shortly, so we'll need to handle them properly.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
2022-07-01 02:05:45 +00:00
Kenneth Graunke dd76196cea intel/compiler: Convert brw_eu.cpp back to brw_eu.c
Now that we've removed the thread_local lookup tables using
pointer-to-member C++ features, this can go back to being a standard
C file, like it was in the past.  We just need to annotate a couple
of things with "struct".

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Kenneth Graunke ea72ec98bf intel/compiler: Remove use of thread_local for opcode tables
We had been using thread_local index -> opcode_desc tables to avoid
plumbing through a storage location throughout all the code.  But now
we have done so with the new brw_isa_info structure.  So we can just
store the tables there, and initialize it with the compiler.

This fixes crashes in gtk4-demo on iris, and should help with some
programs on zink as well.  Something was going wrong with the
thread_local variables not being set up correctly.  While we might be
able to work around that issue, there's really no advantage to storing
these lookup tables in TLS (beyond it being simpler to do originally).
So let's simply stop doing so.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6728
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6229
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Kenneth Graunke 72e9843991 intel/compiler: Introduce a new brw_isa_info structure
This structure will contain the opcode mapping tables in the next
commit.  For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Kenneth Graunke 342471e93d intel/compiler: Move opcode_desc handling to a separate header
This patch creates a new header file, brw_isa_info.h, which will
contains all the functions related to opcode encoding on various
generations.  Opcode numbers may have different meanings on different
hardware, so we remap them between an enum we can easily work with
and the hardware encoding.

We move the brw_inst setters and getters to brw_inst.h.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Kenneth Graunke fdae90aa85 intel/compiler: Split 3DPRIM_* defines out to a separate header.
These clash with genxml and will become a problem shortly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Kenneth Graunke 9f8784232a intel/compiler: Fix brw_gfx_ver_enum.h to be a proper header file
This header file didn't include normal guards against being included
multiple times.  It also defined a function in a header file without
marking it static inline.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Kenneth Graunke a141a351de intel/compiler: Stop including src/mesa/main/config.h
src/mesa/main includes are for Mesa's OpenGL implementation, and the
compiler is used in Vulkan drivers and other tools.  We really only
needed one #define, which is that we offer 32 samplers.  It probably
makes more sense to have our own defined limit for that rather than
importing a project-wide value which theoretically could be adjusted,
so swap MAX_SAMPLERS for a new BRW_MAX_SAMPLERS and call it a day.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Konstantin Seurer 85da294bfe intel: Use nir_test_mask instead of i2b(iand)
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17242>
2022-06-30 18:00:32 +00:00
Lionel Landwerlin 9d7d1c0637 intel/clc: enable fp16 & subgroups for GRL
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17253>
2022-06-27 15:31:49 +00:00
Marcin Ślusarz 42b551fe7f intel/compiler: adjust task payload offsets as late as possible
Otherwise passes which expect offsets to be in bytes (like
brw_nir_lower_mem_access_bit_sizes, called from brw_postprocess_nir)
may produce incorrect results.

Fixes 64-bit load/stores in task/mesh shaders.

Fixes: c36ae42e4c ("intel/compiler: Use nir_var_mem_task_payload")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16196>
2022-06-27 14:14:41 +00:00
Marcin Ślusarz f4386b81e6 intel: fix typos found by codespell
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17191>
2022-06-27 10:20:55 +00:00
Marcin Ślusarz f871aa10a1 intel/compiler: assert that base is 0 for [load|store]_shared intrins
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17143>
2022-06-22 10:32:13 +00:00
Marcin Ślusarz 008163f382 intel/compiler: vectorize task payload loads/stores
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17000>
2022-06-20 17:38:20 +00:00