Commit Graph

3759 Commits

Author SHA1 Message Date
Juan A. Suarez Romero b8f6685bb5 nir: use call_once() to init debug variable
For data-race safety, let's use this function to ensure NIR debug is
initialized only once.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14057>
2021-12-14 08:01:17 +00:00
Juan A. Suarez Romero 18c039b2e1 tgsi-to-nir: initialize NIR_DEBUG envvar
This envvar is initialized when creating a NIR shader, but it needs to
be used before. So initialize it here.

v2 (Juan):
 - Use static variable for first initialization.

Fixes: f77ccdfb4a ("nir: add NIR_DEBUG envvar")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14057>
2021-12-14 08:01:17 +00:00
Jordan Justen 211e0606c7 nir/lower_tex: Add filter for tex offset lowering
Rework:
 * Add callback_data (s-b Jason)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14142>
2021-12-13 16:56:23 -08:00
Samuel Pitoiset be53b3d1bf nir/lower_tex: add lower_lod_zero_width
On AMD, the hardware will return 0 for the raw LOD if the sum of the
absolute values of derivatives is 0 but Vulkan expects the value to
be in the [-inf, -22.0f] range.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14147>
2021-12-13 10:00:07 +00:00
Marcin Ślusarz 87f03b1662 nir: limit lower_clip_cull_distance_arrays input to traditional stages
Compute, task, mesh & raytracing stages don't support
ClipDistance/CullDistance as input.

This change is not needed for correctness. Just something I stumbled on.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14149>
2021-12-13 08:32:23 +00:00
Marek Olšák 2785141c16 nir: add nir_has_divergent_loop function
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>
2021-12-11 20:07:35 +00:00
Marek Olšák 26b522eae5 nir: serialize divergent fields
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>
2021-12-11 20:07:35 +00:00
Marek Olšák 6eb3fe2d4f nir: disable a NIR test due to undebuggable & locally unreproducible CI failures
debian-vulkan but not any other CI pipeline consistently fails with:
    FileNotFoundError: [Errno 2] No such file or directory: 'nir_tests.xml'

I have to assume that either debian-vulkan is broken, or the NIR test
infrastructure is broken. That's not all. I got the same failure when
I wanted to add a new test, which means the CI is preventing us from adding
new NIR tests, which is a very serious problem with the CI or NIR tests.
The python error doesn't imply that it's a test failure, so something else
is broken. If you don't want such commits to happen again, print better
error messages.

See also the discussion in the MR.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>
2021-12-11 20:07:35 +00:00
Marek Olšák 2ab310b78b nir: handle more intrinsics in divergence analysis
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>
2021-12-11 20:07:35 +00:00
Emma Anholt d199d65c3a nir/nir_opt_move,sink: Include load_ubo_vec4 as a load_ubo instr.
We weren't doing much motion in nir-to-tgsi because we considered all our
lowered load-ubos as unmovable.

softpipe shader-db:

total temps in shared programs: 563942 -> 563136 (-0.14%)
temps in affected programs: 9833 -> 9027 (-8.20%)

r300 shader-db:

instructions in affected programs: 22858 -> 23575 (3.14%)
temps in affected programs: 2039 -> 1813 (-11.08%)

(NIR had given r300 -19% instrs for +40% temps, so this feels like a
worthwhile trade back).

Reivewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14138>
2021-12-11 02:12:27 +00:00
Emma Anholt de33205f88 nir/algebraic: Move all the individual transforms to a common table.
Cuts 28% of the remaining relocations in libvulkan_intel.so, shrinks
binary size by 290kb.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:09:00 +00:00
Emma Anholt a29b54f014 nir/algebraic: Mark the automaton's filter tables as const.
Moves it to .rodata instead of .data.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:09:00 +00:00
Emma Anholt 45a8d11b6e nir/algebraic: Pack various bitfields in the nir_search_value_union.
This gets our union's size down to 22 bytes (now smaller than any of the
union's types were before we made the union!).  Cuts another 48kb off of
the drivers.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:09:00 +00:00
Emma Anholt 53f49b7066 nir/algebraic: Move relocations for variable conds to a table.
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each variable assuming
variables mostly don't have conditions or the conditions are mostly
reused).  Reduces libvulkan_intel.so size by 49kb.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:09:00 +00:00
Emma Anholt 8485a78977 nir/algebraic: Move relocations for expression conds to a table.
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each expression
(assuming expressions mostly don't have conditions or the conditions are
mostly reused).  Reduces libvulkan_intel.so size by 8.7kb.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:09:00 +00:00
Emma Anholt 7635379dc7 nir/algebraic: Remove array-of-cond code
You can't have an array of them after removing many-comm-expr, there's no
space in the struct.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:09:00 +00:00
Emma Anholt 5d82c61a30 nir/algebraic: Replace relocations for nir_search values with a table.
Even with packing all 3 types into a 40-byte union (nir_search_constant
being 24 bytes and nir_search_expression having formerly been 32), and
having a single array of them, this cuts 1.7MB from each of
libvulkan_intel.so and libgallium_dri.so.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:09:00 +00:00
Emma Anholt e7d8717375 nir/algebraic: Drop the check for cache == None.
The cache is always set.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:08:59 +00:00
Emma Anholt a263474d3b nir/algebraic: Move some generated-code algebraic opt args into a struct.
I'm going to be adding some more tables to reduce relocations in the
generated code, so move the current tables to a struct for arg-passing
sanity.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
2021-12-07 07:08:59 +00:00
Ian Romanick b88202b0e4 nir/constant_folding: Optimize txb with bias of constant zero to tex
v2: Fail gracefully when bias_idx < 0.  See comment in the code for the
rationale.  See also issue #5722.

All Haswell and newer Intel GPUs had similar results. (Ice Lake shown)
total instructions in shared programs: 19757733 -> 19753431 (-0.02%)
instructions in affected programs: 277248 -> 272946 (-1.55%)
helped: 1644
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.62 x̃: 2
helped stats (rel) min: 0.05% max: 11.11% x̄: 2.11% x̃: 1.61%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.35% max: 0.35% x̄: 0.35% x̃: 0.35%
95% mean confidence interval for instructions value: -2.72 -2.51
95% mean confidence interval for instructions %-change: -2.19% -2.03%
Instructions are helped.

total cycles in shared programs: 938517439 -> 938384079 (-0.01%)
cycles in affected programs: 19548849 -> 19415489 (-0.68%)
helped: 1358
HURT: 269
helped stats (abs) min: 1 max: 2328 x̄: 133.01 x̃: 16
helped stats (rel) min: <.01% max: 41.12% x̄: 1.40% x̃: 0.48%
HURT stats (abs)   min: 1 max: 1302 x̄: 175.70 x̃: 30
HURT stats (rel)   min: <.01% max: 69.03% x̄: 6.24% x̃: 1.04%
95% mean confidence interval for cycles value: -99.14 -64.79
95% mean confidence interval for cycles %-change: -0.47% 0.19%
Inconclusive result (%-change mean confidence interval includes 0).

LOST:   21
GAINED: 32

All Ivy Bridge and older Intel GPUs had similar results. (Ivy Bridge shown)
total instructions in shared programs: 15302017 -> 15301485 (<.01%)
instructions in affected programs: 22565 -> 22033 (-2.36%)
helped: 168
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 3.17 x̃: 3
helped stats (rel) min: 0.04% max: 4.39% x̄: 3.05% x̃: 3.27%
95% mean confidence interval for instructions value: -3.45 -2.89
95% mean confidence interval for instructions %-change: -3.19% -2.91%
Instructions are helped.

total cycles in shared programs: 550119761 -> 549989147 (-0.02%)
cycles in affected programs: 12834251 -> 12703637 (-1.02%)
helped: 164
HURT: 0
helped stats (abs) min: 20 max: 4547 x̄: 796.43 x̃: 294
helped stats (rel) min: 0.23% max: 53.84% x̄: 2.05% x̃: 0.37%
95% mean confidence interval for cycles value: -942.62 -650.24
95% mean confidence interval for cycles %-change: -3.17% -0.94%
Cycles are helped.

fossil-db results:

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 142073649 -> 141307526 (-0.5%)
SENDs in all programs: 6876848 -> 6876778 (-0.0%)
Loops in all programs: 38283 -> 38283 (+0.0%)
Cycles in all programs: 8410049681 -> 8402902960 (-0.1%)
Spills in all programs: 190623 -> 190599 (-0.0%)
Fills in all programs: 297780 -> 297756 (-0.0%)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14025>
2021-12-06 19:50:42 +00:00
Lionel Landwerlin 0cbcc15afe nir: add a ray query optimization pass
Just remove queries that are never used or proceeded with. The latter
case leading to undefined values.

v2: Don't use nir_shader_instructions_pass() to find variables (Caio)
    Simplify replacement (Caio)

v3: Don't track all the queries intrinsic effects (Caio)
    Rename things to represent only read queries (Caio)
    Use set instead of hash_table (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>
2021-12-04 20:46:35 +00:00
Lionel Landwerlin 5a9cdab170 nir: track variables representing ray queries
v2: Fix missing ray_query variable check (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>
2021-12-04 20:46:35 +00:00
Lionel Landwerlin 0d6f050b46 nir: add intrinsics for ray queries
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>
2021-12-04 20:46:35 +00:00
Lionel Landwerlin 0800ec2c77 nir: add a new access flag to allow access in helper invocations
v2: Add nir_print support

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>
2021-12-04 20:46:35 +00:00
Lionel Landwerlin 54489b3c09 nir/print: printout ACCESS_STREAM_CACHE_POLICY
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>
2021-12-04 20:46:35 +00:00
Lionel Landwerlin f98984ad13 nir/lower_io: include the variable access in the lowered intrinsic
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>
2021-12-04 20:46:35 +00:00
Marcin Ślusarz b717872e08 intel/compiler: Get mesh_global_addr from the Inline Parameter for Task/Mesh
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Timur Kristóf f28adc711f nir: Print task and mesh shader I/O variable names.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14007>
2021-12-03 21:34:45 +00:00
Timur Kristóf 7e66da89f8 nir: Fix sorting per-primitive outputs.
Fixes: 59860d4873
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14006>
2021-12-03 17:06:47 +00:00
Rhys Perry a2d8c5b26d nir/algebraic: optimize a*#b & -4
fossil-db (Sienna Cichlid):
Totals from 611 (0.47% of 128647) affected shaders:
CodeSize: 3096680 -> 3090976 (-0.18%)
Instrs: 570494 -> 569249 (-0.22%)
Latency: 5765865 -> 5759619 (-0.11%)
InvThroughput: 969840 -> 967608 (-0.23%)
VClause: 9690 -> 9688 (-0.02%)
Copies: 42884 -> 42894 (+0.02%); split: -0.01%, +0.03%
PreVGPRs: 28290 -> 28288 (-0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13752>
2021-12-03 13:41:07 +00:00
Rhys Perry 2368c36427 nir/opt_offsets: remove need to loop try_extract_const_addition
fossil-db (Sienna Cichlid):
Totals from 1 (0.00% of 134572) affected shaders:
no stat changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14009>
2021-12-03 11:51:49 +00:00
Rhys Perry 5c0fe11072 nir/opt_offsets: fix try_extract_const_addition recursion
This initially looks like a miscompilation bug, but I don't think it's
actually possible for it to create incorrect code.

fossil-db (Sienna Cichlid):
Totals from 32 (0.02% of 134572) affected shaders:
VGPRs: 1336 -> 1320 (-1.20%)
CodeSize: 90552 -> 89468 (-1.20%)
Instrs: 17007 -> 16852 (-0.91%); split: -0.92%, +0.01%
Latency: 429040 -> 428136 (-0.21%); split: -0.21%, +0.00%
InvThroughput: 84966 -> 84572 (-0.46%); split: -0.47%, +0.00%
Copies: 1458 -> 1468 (+0.69%); split: -0.07%, +0.75%
Branches: 382 -> 384 (+0.52%)
PreSGPRs: 970 -> 968 (-0.21%)
PreVGPRs: 1029 -> 1011 (-1.75%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14009>
2021-12-03 11:51:49 +00:00
Juan A. Suarez Romero f77ccdfb4a nir: add NIR_DEBUG envvar
Move all the NIR related debug environmental variables in a single
NIR_DEBUG one.

Use NIR_DEBUG=help to print all the available options.

v2:
 - Use a macro to simplify (Marcin, Jason)
 - Remove wrong changes (Marcin)

v3 (Marcin):
 - Remove rendundant NIR mentioning in option descriptions.
 - Unwrap option descriptions.
 - Ensure the constant is unsigned.
 - Use extern array to remove switch.

v4:
 - Add missing kernel shader (Jason).
 - Add unlikely() (Marcin).

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13840>
2021-12-03 11:15:29 +00:00
Emma Anholt 06fe04b4d7 nir: Make nir_build_alu() variants per 1-4 arg count.
This saves a bunch of generated code to pack up the extra NULLs to get to
4 args, and saves executing the conditions in nir_build_alu() to then skip
those NULLs.

Saves another 27kb on disk.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13916>
2021-12-01 22:12:19 +00:00
Emma Anholt e770ec1182 nir: Uninline a bunch of nir.h functions.
I aimed for "things that look like big switch statements, or cases where
the compiler is unlikely to be able to constant-propagate an argument into
something useful."

Saves another 80kb on disk.  No perf difference on iris shader-db, n=23.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13916>
2021-12-01 22:12:19 +00:00
Lionel Landwerlin 8e568d3f00 nir/opt_deref: don't try to cast empty structures
Found while running valgrind :

==3583454== Invalid read of size 4
==3583454==    at 0xF48336: glsl_get_struct_field_offset (nir_types.cpp:84)
==3583454==    by 0xC7CD0D: opt_replace_struct_wrapper_cast (nir_deref.c:1068)
==3583454==    by 0xC7CDD9: opt_deref_cast (nir_deref.c:1087)
==3583454==    by 0xC7DD8E: nir_opt_deref_impl (nir_deref.c:1369)
==3583454==    by 0xC7DF4E: nir_opt_deref (nir_deref.c:1428)
==3583454==    by 0xA63F3C: brw_kernel_from_spirv (brw_kernel.c:325)
==3583454==    by 0xA3BC2C: main (intel_clc.c:481)
==3583454==  Address 0xe4f7e88 is 24 bytes after a block of size 48 in arena "client"

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13952>
2021-12-01 08:24:39 +00:00
Mykhailo Skorokhodov 391569e911 nir: Fix read depth for predecessors
In some non-trivial cases (the amber script file in the merge
request description) phi instruction has more than 32 elements
in predecessors tree and that isn't recursion, just large tree.
In that case, phis not fully converted into a register or mov,
but successfully removed.

The fix removes the counter and adds container of visited blocks.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3690
Cc: mesa-stable
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13710>
2021-11-30 00:12:48 +00:00
Rhys Perry 32a8b391e3 nir/tests: add DCE test for loops following a jump
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10284>
2021-11-29 22:22:24 +00:00
Rhys Perry cc5dd15417 nir/cf: fix insertion of loops/ifs after jumps
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10284>
2021-11-29 22:22:24 +00:00
Rhys Perry 2fe13aa2ad nir/dce: fix DCE of loops with a halt or return instruction in the pre-header
If there is a halt or return instruction right before a loop with a single
continue, we would have taken the fast path intended for loops without
continues.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 71a985d80b ("nir/dce: perform DCE for unlooped instructions in a single pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10284>
2021-11-29 22:22:24 +00:00
Ilia Mirkin b7f423006a nir/lower_clip: support clipdist array + no vars
This runs after the "to io" lowering on freedreno. Support this case.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>
2021-11-28 04:44:56 +00:00
Ilia Mirkin 7efb1c4b29 nir/lower_clip: increment num_inputs/outputs by appropriate amount
The inputs/outputs are meant to be in vec4 units.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>
2021-11-28 04:44:56 +00:00
Ilia Mirkin 3bf47700e2 nir/lower_clip: location offset goes into offset, not base
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>
2021-11-28 04:44:56 +00:00
Ilia Mirkin a8930e6302 nir/lower_clip: replace bogus comment about gl_ClipDistance reading in GL
gl_ClipDistance most definitely can be read in fragment shaders since
GLSL 1.30. This is also accessible in ES with EXT_clip_cull_distance.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>
2021-11-28 04:44:56 +00:00
Marek Olšák e54264c84f nir: add shader_info::source_sha1, its initialization and printing
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13869>
2021-11-26 11:58:27 +00:00
Rhys Perry 34510ce3cc nir/lower_subgroups: fix left shift of -1
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5365
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12901>
2021-11-24 16:45:05 +00:00
Rhys Perry 811a7a2d31 nir/lower_tex: don't calculate texture_mask for texture_index>=32
With Vulkan, texture_index can be 32 or larger, which creates a shift
exponent larger than 31 (undefined behaviour). Since we don't use
texture_mask with Vulkan, just initialize it to 0.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5365
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12901>
2021-11-24 16:45:04 +00:00
Rhys Perry b425100781 spirv: run nir_copy_prop before nir_rematerialize_derefs_in_use_blocks_impl
spirv_to_nir sometimes wraps derefs in vec2 or mov instructions as part of
its texture handling. These get in the way of
nir_rematerialize_derefs_in_use_blocks_impl. Running copy propagation
should get rid of the extra move instructions and get us back to intact
deref chains for everything except variable pointer use-cases.

fossil-db (Sienna Cichlid):
Totals from 6 (0.00% of 134572) affected shaders:
CodeSize: 92656 -> 93088 (+0.47%)
Instrs: 17060 -> 17138 (+0.46%)
Latency: 224408 -> 227539 (+1.40%)
InvThroughput: 37402 -> 37924 (+1.40%)
VClause: 408 -> 402 (-1.47%)
Copies: 1065 -> 1107 (+3.94%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5668
Fixes: 14a12b771d ("spirv: Rework our handling of images and samplers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13924>
2021-11-24 15:43:51 +00:00
Danylo Piliaiev 99388f0c27 freedreno/ir3: handle global atomics
Only for a6xx since we don't know the instructions for global
atomics on previous gens. Per Qualcomm's docs in OpenCL atomics
are only supported since a5xx together with Generic memory space.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>
2021-11-23 18:26:37 +00:00
Emma Anholt 7603187aec nir: Un-inline more of nir_builder.h.
Cuts another 470KB of libnir.a in my release build.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>
2021-11-22 20:40:47 +00:00
Emma Anholt d9bfcf5f5b nir: Un-inline nir_builder_alu_instr_finish_and_insert()
This function is big and I don't think it will won't get meaningfully
constant-propagated during inlining without LTO.  Move it to a .c file so
we just have one copy, saving 2.8MB from libnir.a on an amd64 release
build.

      text       data        bss      total filename
before:
  18953406    7768312     687260   27408978 build-release/driver-symlinks/iris_dri.so
   9734366    5542453     481692   15758511 build-release/lib/libvulkan_intel.so
  28687772   13310765    1168952   43167489 (TOTALS)

after:
  15478350    7767864     687260   23933474 build-release/driver-symlinks/iris_dri.so
   6810366    5541685     481692   12833743 build-release/lib/libvulkan_intel.so
  22288716   13309549    1168952   36767217 (TOTALS)

No statistically significant performance difference on iris shader-db, n=8.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>
2021-11-22 20:40:47 +00:00
Ilia Mirkin 3b5b4b5d45 nir: apply interpolated input intrinsics setting when lowering clipdist
For drivers that use this in fragment shaders, load_input is going to
produce incorrect results (flat-shaded values).

Fixes clipping tests on a4xx.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>
2021-11-22 20:11:19 +00:00
Ilia Mirkin df934873e1 nir: always keep the clip distance array size updated
Drivers expect to know the number of clip distances irrespective of
whether compact arrays are used or not.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>
2021-11-22 20:11:19 +00:00
Connor Abbott 508f917d8c util/dag: Make edge data a uintptr_t
Nobody was actually using it as a pointer, and I'm going to introduce a
shared function which relies on it not being a pointer so let's fix this
once and for all.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>
2021-11-17 13:41:47 +00:00
Samuel Pitoiset 011ea32585 nir: fix constant expression of ibitfield_extract
This fixes dEQP-VK.graphicsfuzz.cov-condition-bitfield-extract-integer.

For example, nir_ibitfield_extract(3, 1, 2) should return 1.

Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13791>
2021-11-16 17:32:21 +00:00
Timur Kristóf 59860d4873 nir: Group per-primitive outputs at the end for driver location assign.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Timur Kristóf f23f7ef316 nir: Don't compact per-vertex and per-primitive outputs together.
Prevent nir_compact_varyings from putting per-vertex and per-primitive
output components in the same slot.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Timur Kristóf e1e461d11c nir: Lower cull and clip distance arrays for mesh shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Timur Kristóf 6a502a0a2c nir: Add new option to lower invocation ID from invocation index.
Add this as an option to nir_lower_compute_system_values_options
instead of just relying on the shader's options.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Timur Kristóf 7562e34463 nir, spirv: Don't mark NV_mesh_shader primitive indices as per-primitive.
They are not per-primitive in NV_mesh_shader, but a flat array.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Timur Kristóf d79d9a7a06 nir: Fix nir_lower_io with per primitive outputs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Timur Kristóf 9cf4124be0 nir: Print Mesh Shader specific info.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Timur Kristóf 5aa39253cb nir: Rename nir_get_io_vertex_index_src and include per-primitive I/O.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Ilia Mirkin 185826a400 nir: remove double-validation of src component counts
The nir_tex_instr_src_size helper already sorts this out correctly, no
need to do it twice, and validate_src takes care of it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13781>
2021-11-16 01:23:41 +00:00
Daniel Schürmann 1e4c6e059e nir/fold_16bit_sampler_conversions: skip sparse residency tex instructions
The residency return value mismatches between NIR and Radeon.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13592>
2021-11-15 18:28:20 +00:00
Rhys Perry 719b48f85d nir/lower_system_values: replace local_invocation_id components with zero
fossil-db (Sienna Cichlid):
Totals from 360 (0.28% of 128647) affected shaders:
VGPRs: 7912 -> 7272 (-8.09%); split: -8.59%, +0.51%
CodeSize: 542456 -> 544688 (+0.41%); split: -0.32%, +0.73%
MaxWaves: 10866 -> 10952 (+0.79%)
Instrs: 95973 -> 96010 (+0.04%); split: -0.34%, +0.38%
Latency: 4366023 -> 4344664 (-0.49%); split: -0.90%, +0.41%
InvThroughput: 19656659 -> 18297185 (-6.92%); split: -6.92%, +0.00%
VClause: 3242 -> 3116 (-3.89%); split: -4.04%, +0.15%
SClause: 3422 -> 3504 (+2.40%); split: -0.20%, +2.60%
Copies: 8854 -> 9376 (+5.90%); split: -0.89%, +6.79%
Branches: 2329 -> 2326 (-0.13%); split: -0.39%, +0.26%
PreSGPRs: 7620 -> 7841 (+2.90%); split: -0.43%, +3.33%
PreVGPRs: 5765 -> 5504 (-4.53%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel-schuermann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13757>
2021-11-12 18:59:51 +00:00
Alyssa Rosenzweig e257344a82 nir/lower_pntc_ytransform: Support PointCoordIsSysval
Pattern match the point coord sysval and support lowering it as well.
This is required to handle flipped framebuffers on Bifrost. However,
what this pass normalizes to is the opposite of the hardware mode we
used on Bifrost before, so we need to swap modes at the same time to
prevent regressions.

Fixes Piglit glsl-fs-pointcoord and glsl-fs-pointcoord_gles2

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13073>
2021-11-12 12:34:14 +00:00
Marek Olšák 33b4eb149e nir: add new SSA instruction scheduler grouping loads into indirection groups
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13604>
2021-11-08 21:20:11 +00:00
Filip Gawin f32dcb6fe1 nir: assert that variables in optimize_atomic are initialized
If you gonna view context of function parse_atomic_op,
then you gonna know that index for array (data_src)
can be unitialized. Imho this approach is cleaner
than doing stuff inside parse_atomic_op.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12995>
2021-11-08 15:10:07 +00:00
Rhys Perry 12294026d5 nir/algebraic: optimize Cyberpunk 2077's open-coded bitfieldReverse()
fossil-db (Sienna Cichlid):
Totals from 9 (0.01% of 128647) affected shaders:
CodeSize: 29900 -> 28640 (-4.21%)
Instrs: 5677 -> 5443 (-4.12%)
Latency: 96561 -> 95025 (-1.59%)
Copies: 571 -> 544 (-4.73%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13673>
2021-11-05 09:31:04 +00:00
Mike Blumenkrantz 16f838576c nir/lower_io_to_scalar: add support for bo and shared io
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13485>
2021-10-27 16:46:01 +00:00
Alyssa Rosenzweig d8b1afdc85 nir/lower_blend: Use correct clamp for SNORM
nir_lower_blend was written against the OpenGL ES 3.2 specification,
which does not support blending SNORM render targets. The ES spec
says that non-floating point buffers get clamped to [0, 1] before
blending. The story is not so simple: SNORM buffers are blendable in
OpenGL and must clamped to [-1, 1] rather than [0, 1]. Handle this case.

NIR does have the fsat_signed_mali instruction to clamp to [-1, 1], but
it is only implemented in Panfrost, and this pass is in common code.
Open code it instead. Panfrost optimizes the open coded version, so this
is good enough.

Fixes SNORM subtests of Piglit arb_texture_view-rendering-formats.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13499>
2021-10-26 19:16:36 +00:00
Danylo Piliaiev b7c7abded7 nir/serialize: Make more space for intrinsic_op allowing 1024 ops
We are close to the limit of 512 intrinsics, make more space to
be able to support up to 1024 intrinsics.

Take one bit from packed_const_indices, they shouldn't suffer in
a common case.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13456>
2021-10-25 16:17:09 +00:00
Danylo Piliaiev 1eee1fda11 nir/lower_amul: do not lower 64bit amul to imul24
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13300>
2021-10-21 18:59:57 +00:00
Caio Marcelo de Oliveira Filho 662fbc0120 nir: Use a single binary for gtests
Less artifacts and less time running linker.  The
load_store_vectorizer test is still split since we need to update
gitlab-ci scripts to skip certain tests in certain builds. Added a
TODO with the concrete suggestion.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13414>
2021-10-20 18:26:31 +00:00
Jason Ekstrand b62b2fa4b9 compiler/types: Add a wrap_in_arrays helper
This has been copied+pasted 3 times now.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
2021-10-16 05:49:34 +00:00
Jason Ekstrand 5818d47ae6 spirv: Use texture types for sampled images
Instead of using gsamplerND types for sampled images, use the new
gtextureND types for sampled images and reserve gsamplerND for combined
image+samplers.  Combined image+sampler bindings still get a gsamplerND
type.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
2021-10-16 05:49:34 +00:00
Jason Ekstrand b8a0bf2343 nir/deref: Also optimize samplerND -> textureND casts
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
2021-10-16 05:49:34 +00:00
Jason Ekstrand 2ab5546a96 nir: Allow texture types
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
2021-10-16 05:49:34 +00:00
Jason Ekstrand 3ace6b968b compiler/types: Add a texture type
This is separate from images and samplers.  It's a texture (not a
storage image) without a sampler.  We also add C-visible helpers to
convert between sampler and image types.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
2021-10-16 05:49:34 +00:00
Jason Ekstrand d343aef942 nir/serialize: Pack deref modes better
With nir_var_image, we've now run out of bits in our packed blob for
deref instructions.  We could revert to an unpacked blob or we could be
a bit more clever about how we encode deref modes and pack them into 5
bits.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
2021-10-16 03:47:10 +00:00
Jason Ekstrand 9272a952c9 nir: Re-arrange the variable modes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
2021-10-16 03:47:10 +00:00
Jason Ekstrand 956199e870 nir: s/nir_var_mem_image/nir_var_image/g
We typically use nir_var_mem_* for stuff that has an explicit byte-based
memory layout.  Images are opaque.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
2021-10-16 03:47:10 +00:00
Dylan Baker e73096bd6d meson: use gtest protocol for gtest based tests when possible
With the `gtest` protocol meson will add some extra arguments to the
test to generate better junit results, which may be useful. This
protocol is only available in meson 0.55.0+, so keep using the default
`exitcode` protocol for meson older than that.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8484>
2021-10-16 03:22:24 +00:00
Jason Ekstrand 58f605e4d4 nir: Drop our attempt at typed-based image mode validation
This is broken for bindless images declared as local variables.  It
turns out nir_variable::data::bindless is only used for uniforms and we
already assume anything in nir_var_function_temp or similar is bindless.
We could try to make a tricky assert but now that we have everything
else passing but now that we've got everyone converted the extra
validation probably isn't necessary.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13384>
2021-10-15 22:35:59 +00:00
Jason Ekstrand 4c5a88d735 nir: Validate image variable modes
We can also significantly simplify the foreach_image_variable helper.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:56 +00:00
Jason Ekstrand 6818811fc4 nir/lower_readonly_images_to_tex: Also rewrite variable modes
Storage images will start using nir_var_mem_image but sampled images
still use nir_var_uniform.  If we're going to rewrite types, we need to
rewrite the modes as well.  Otherwise, nir_validate will get grumpy and
drivers might get confused.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:56 +00:00
Jason Ekstrand 2a53c33fbe nir: Add a nir_foreach_image_variable() iterator
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:55 +00:00
Caio Marcelo de Oliveira Filho de3705edb0 nir: Add nir_var_mem_image
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:55 +00:00
Caio Marcelo de Oliveira Filho 872750bb96 nir/schedule: Handle nir_intrisic_scoped_barrier
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:55 +00:00
Mike Blumenkrantz f769f34680 nir/print: print bindless info as applicable
this is useful to know

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13204>
2021-10-14 15:11:38 +00:00
Ian Romanick ae99ea6f4d nir/loop_unroll: Always unroll loops that iterate at most once
Two carchase compute shaders (shader-db) and two Fallout 4 fragment
shaders (fossil-db) were helped.  Based on the NIR of the shaders, all
four had structures like

    for (i = 0; i < 1; i++) {
        ...

	for (...) {
            ...
	}
    }

All HSW+ platforms had similar results. (Ice Lake shown)
total loops in shared programs: 6033 -> 6031 (-0.03%)
loops in affected programs: 4 -> 2 (-50.00%)
helped: 2
HURT: 0

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 143692018 -> 143692006 (-0.0%)
SENDs in all programs: 6947154 -> 6947154 (+0.0%)
Loops in all programs: 38285 -> 38283 (-0.0%)
Cycles in all programs: 8434822225 -> 8434476815 (-0.0%)
Spills in all programs: 191665 -> 191665 (+0.0%)
Fills in all programs: 298822 -> 298822 (+0.0%)

In the presense of loop unrolling like this, the change in cycles is not
accurate.

v2: Rearrange the logic in the if-condition to read a little better.
Suggested by Tim.

Closes: #5089
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13323>
2021-10-13 20:11:13 -07:00
Qiang Yu 50c0451424 nir/linker: rename replace_constant_input to replace_varying_input_by_constant_load
To align with replace_varying_input_by_uniform_load and better
describe what it does.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>
2021-10-13 04:45:15 +00:00
Qiang Yu 2604625043 nir/linker: support uniform when optimizing varying
Varying assigned from uniform won't change after interpolation,
so move uniform load to fragment shader to eliminate the varying.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>
2021-10-13 04:45:15 +00:00
Filip Gawin 28a6e45a0f nir: avoiding reading unitialized memory when using nir_dest_copy
Deeper in chain of calls, function "src_has_indirect" is used (which
reads "is_ssa" and "reg.indirect").

Fixes: d1eae6f36b ("nir: Properly clean up nir_src/dest indirects")

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13317>
2021-10-13 02:21:20 +00:00
Connor Abbott b516208a55 nir/lower_ubo_vec4: Fix align_mul=8 special case
In order for the load to never straddle the load can't extend past 8
bytes, not 16. For example a vec2 load with align_mul = 8 and
align_offset = 4 can straddle.

Fixes assertion failures when we stop pushing UBOs in the preamble on
a6xx.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
2021-10-12 11:30:52 +00:00
Jason Ekstrand 878d8d96c7 nir/lower_discard_or_demote: Fix metadata
Passes generally shouldn't use nir_metadata_all unless they don't change
the program in any significant way.  Some of these passes insert new
instructions so they should definitely not be preserving most of it.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13261>
2021-10-08 23:24:49 +00:00
Chia-I Wu 8cce6281e6 util/vector: make util_vector_init harder to misuse
Make u_vector_init a wrapper to u_vector_init_pot.  Let both take
(element_count, element_size) as parameters.

Motivated by eed0fc4caf ("vulkan/wsi/wayland: fix an invalid
u_vector_init call")

v2: rename u_vector_init_pot to u_vector_init_pow2

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13201>
2021-10-08 00:15:11 +00:00
Boris Brezillon 56251f924d nir: Add a nir_sysvals_to_varyings() helper
Allow backends to turn some sysvals into input varyings so the frontend
(in our case spirv_to_nir()) doesn't have to bother selecting which
one is expected.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13017>
2021-10-07 19:45:35 +00:00
Jason Ekstrand b71bdc3404 nir/algebraic: Add some opts for comparisons of comparisons
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>
2021-10-07 18:21:11 +00:00
Jason Ekstrand 7abf3955ca nir/algebraic: Add some boolean optimizations
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>
2021-10-07 18:21:11 +00:00
Jason Ekstrand c8b2be0b95 nir/algebraic: Lower fisfinite
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>
2021-10-07 18:21:11 +00:00
Rhys Perry f3723822a4 nir/lower_tex: add lower_to_fragment_fetch_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>
2021-10-07 15:36:39 +00:00
Rhys Perry 225fe37c14 nir: add _amd suffix to fragment_mask_fetch and fragment_fetch texops
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>
2021-10-07 15:36:39 +00:00
Marcin Ślusarz 3a18963b08 nir/print: pad 64-bit constants with zeroes
... just like other-size constants are.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13223>
2021-10-07 10:49:15 +00:00
Emma Anholt 7dde279db5 nir-to-tgsi: Avoid emitting TXL just for lod 0 on non-vertex shaders.
Prompted by comparing virgl fails and finding that it has issues with
immediate args to TXL/TXB, at least.

Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12800>
2021-10-06 03:44:17 +00:00
Ian Romanick cb28361642 nir/algebraic: Small optimizations for SpvOpFOrdNotEqual and SpvOpFUnordEqual
No shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 144380118 -> 143692823 (-0.5%)
SENDs in all programs: 6920822 -> 6920822 (+0.0%)
Loops in all programs: 38299 -> 38299 (+0.0%)
Cycles in all programs: 8434782176 -> 8423078994 (-0.1%)
Spills in all programs: 206830 -> 204469 (-1.1%)
Fills in all programs: 318737 -> 313660 (-1.6%)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>
2021-10-06 01:53:47 +00:00
Alyssa Rosenzweig 3e8f540753 nir: Add Mali-specific derivative opcodes
Add derivative opcodes fddx_must_abs_mali/fddy_must_abs_mali satisfying:

   fabs(fdd*_must_abs_mali(v)) = fabs(fdd*(v))

The sign of their result is undefined.

On Bifrost and Valhall, these unsigned derivatives can be implemented
more efficiently than the correctly-signed counterparts, since the sign
fixup requires extra ALU instructions. On backends where this is the
case, it is useful to optimize fabs(fdd*(v)) to
fabs(fdd*_must_abs_mali(v)). This pattern comes up with the GLSL builtin
`fwidth`.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12332>
2021-10-06 00:40:57 +00:00
Lionel Landwerlin d0a3a11258 nir/lower_io: preserve all metadata when no progress
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13168>
2021-10-05 11:23:23 +00:00
Marcin Ślusarz e26328582a nir: preserve all metadata when nir_opt_vectorize doesn't make progress
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz 54df09c8d4 nir: preserve all metadata when nir_propagate_invariant doesn't make progress
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz 804c56f1a2 nir: preserve all metadata when nir_lower_int_to_float doesn't make progress
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Boris Brezillon 7cd402c9c8 nir/lower_blend: Shrink blended result if needed
Make sure the new and old sources have the same number of components,
otherwise the NIR validation pass complains.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
2021-09-30 16:54:42 +02:00
Boris Brezillon 3e07b8d4f8 nir/lower_blend: Make sure we're not passed scaled formats
SCALED formats are interpreted as floats, but not in the usual [0, 1]
or [-1, 1] range, meaning that the blend lowering logic can't directly
apply to those. Assert that the format being passed is not a scaled
format.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
2021-09-30 16:54:42 +02:00
Boris Brezillon 15b4cab4d5 nir/lower_blend: Don't lower RTs whose format is set to NONE
The caller doesn't necessarily want to lower blend operations on all
render targets since some of them might be supported natively (panvk
will be in that case). Let's just skip RTs that have a format set to
PIPE_FORMAT_NONE to allow that.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
2021-09-30 16:54:42 +02:00
Boris Brezillon 637cd5ac00 nir/lower_blend: Pad src to a 4-component vector
nir_ssa_for_src() is not supposed to pad the src vector if
dst->num_components > src->num_components. Let's pad things explicitly
with nir_pad_vector().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
2021-09-30 16:54:42 +02:00
Boris Brezillon 641bed3103 nir: Make sure src->num_components < dst->num_components in nir_ssa_for_src()
The NIR validation complains if the swizzle accesses a component that's
not present in the source.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
2021-09-30 16:54:42 +02:00
Lionel Landwerlin daa8a81d99 nir: fix opt_memcpy src/dst mixup
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: f6667cb0ce ("nir: Add a memcpy optimization pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13079>
2021-09-28 16:36:08 +00:00
Rhys Perry e43007af56 nir/opt_if: add opt_if_rewrite_uniform_uses
Turns:
if (a == (b=readfirstlane(a)))
   use(a)
into:
if (a == (b=readfirstlane(a)))
   use(b)

Improves divergence analysis and lets us scalarize use(a). Improves
Cyberpunk 2077 performance.

fossil-db (Sienna Cichlid, Cyberpunk 2077):
Totals from 57 (10.56% of 540) affected shaders:
VGPRs: 4904 -> 4040 (-17.62%)
CodeSize: 624360 -> 626828 (+0.40%); split: -0.06%, +0.46%
MaxWaves: 656 -> 824 (+25.61%)
Instrs: 119770 -> 119447 (-0.27%); split: -0.49%, +0.22%
Latency: 1950256 -> 1633110 (-16.26%); split: -16.26%, +0.00%
InvThroughput: 364852 -> 292089 (-19.94%)
VClause: 1512 -> 1008 (-33.33%)
SClause: 2693 -> 3196 (+18.68%)
Copies: 10050 -> 9955 (-0.95%); split: -3.34%, +2.40%
Branches: 3476 -> 3547 (+2.04%)
PreSGPRs: 4003 -> 5076 (+26.80%)
PreVGPRs: 4709 -> 3810 (-19.09%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12472>
2021-09-24 18:41:18 +00:00
Rhys Perry 69f9a96af1 nir: add nir_src_components_read()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12472>
2021-09-24 18:41:18 +00:00
Caio Marcelo de Oliveira Filho 240e60ba76 nir/lower_io_to_vector: Allow Task/Mesh to load from outputs
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12951>
2021-09-24 14:35:15 +00:00
Bas Nieuwenhuizen 0d8bd8518d nir: Support ray launch size in divergence analysis.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>
2021-09-21 01:53:39 +00:00
Bas Nieuwenhuizen 56b06c09b4 nir: Add AMD rt intrinsics.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>
2021-09-21 01:53:39 +00:00
Bas Nieuwenhuizen b6be96a2bd radv: Modify load_sbt_amd intrinsic to get the descriptor.
That way we can get the address to the entry, which is needed for
some nir builtins because extra data in the entry can be used as
shader input.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>
2021-09-21 01:53:39 +00:00
Timur Kristóf 872d21820f nir: Exclude non-generic patch variables from get_variable_io_mask.
These are I/O variables which are not going to be removed anyway.
However, get_variable_io_mask handles their location incorrectly.

Found using the GCC undefined behavior sanitizer.
Fixes the following error:

runtime error:
shift exponent 4294967258 is too large
for 64-bit type 'long unsigned int'

Closes: #5319
Fixes: cf5f8f55c3
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12719>
2021-09-20 18:08:16 +00:00
Ian Romanick d7ba52cce9 nir/edgeflags: Add a flag to indicate the edge flag input is needed
Most modern hardware needs the edge flag added as a hidden vertex input
and needs code added to the vertex shader to copy the input to an
output.  Intel hardware is a little different.  Gfx4 and Gfx5 hardware
works in the previously described mannter.  Gfx6+ hardware needs the
edge flag as a specific vertex shader input, and that input is magically
processed by fixed-function hardware without need for extra shader code.

This flag signals only that the vertex shader input is needed.  It would
be nice if we could decouple adding the vertex shader input from
generating the copy-to-output code, but that has proven to be
challenging.  Not having that code causes other passes to want to
eliminate that shader input.

v2: Convert conditional to assertion.  This pass is only called for
vertex shaders.  Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
2021-09-17 16:36:08 -07:00
Rhys Perry a1af902531 nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants
This allows for more MAD/FMA instructions to be created.

fossil-db (Sienna Cichlid):
Totals from 50134 (33.46% of 149839) affected shaders:
VGPRs: 2436536 -> 2436000 (-0.02%); split: -0.05%, +0.03%
SpillSGPRs: 13136 -> 13135 (-0.01%); split: -0.02%, +0.02%
CodeSize: 206621424 -> 206278292 (-0.17%); split: -0.23%, +0.07%
MaxWaves: 1116804 -> 1117448 (+0.06%); split: +0.07%, -0.01%
Instrs: 38977460 -> 38862886 (-0.29%); split: -0.33%, +0.04%
Latency: 832425389 -> 827432260 (-0.60%); split: -0.63%, +0.03%
InvThroughput: 184193457 -> 183563350 (-0.34%); split: -0.37%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7458>
2021-09-17 17:28:26 +00:00
Jason Ekstrand 6c7d23e6ca nir: Stop sweeping indirects
They're no longer ralloc'd.

Fixes: 879a569884 "nir: Switch from ralloc to malloc for NIR instructions."
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>
2021-09-16 11:28:36 +00:00
Jason Ekstrand d1eae6f36b nir: Properly clean up nir_src/dest indirects
Now that they're no longer ralloc'd, we have to be much more careful
about indirects.  We have to make sure every time a source or
destination is overwritten, its indirect (if any) is freed.  We also
have to choose a memory ownership convention for the rewrite functions.
Assuming that they will be called with the source from some other
instruction, we choose to always make a copy of the indirect (if any).
It's the responsibility of the caller to ensure its copy of the indirect
is freed.

Unfortunately, all this extra logic is going to make
nir_instr_rewrite/move_src/dest more expensive because they now have
all the logic of nir_src/dest_copy instead of a simple struct
assignment.  Fortunately, the vast majority of rewrite calls are done by
nir_ssa_def_rewrite_uses which is an SSA-only fast-path.

Fixes: 879a569884 "nir: Switch from ralloc to malloc for NIR instructions."
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>
2021-09-16 11:28:36 +00:00
Emma Anholt aed4c0b5a9 nir: Drop the unused instr arg for src/dest copy functions.
Now that we don't use ralloc, we don't need this arg to get at the right
ralloc ctx.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:06 +00:00
Emma Anholt 879a569884 nir: Switch from ralloc to malloc for NIR instructions.
By replacing the 48-byte ralloc header with our exec_node gc_node (16
bytes), runtime of shader-db on my system across this series drops
-4.21738% +/- 1.47757% (n=5).

Inspired by discussion on #5034.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:06 +00:00
Emma Anholt feee5e6974 nir/tests: Fix transmuting an SSA dest to be non-SSA
With the de-ralloc changes, having the register dest not have its .reg
properly initialized caused crashes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:06 +00:00
Emma Anholt 1edff520e2 nir/lower_phis_to_scalar: Use nir_instr_free() to free instrs.
Preparation for de-rallocing instrs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:06 +00:00
Emma Anholt d1a2870f78 nir: Add all allocated instructions to a GC list.
Right now we're using ralloc to GC our NIR instructions, but ralloc has
significant overhead for its recursive nature so it would be nice to use a
simpler mechanism for GCing instructions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:06 +00:00
Emma Anholt 22788d68eb nir: Consistently pass the instr to nir_src_copy().
The arg says it's supposed to be the instr, not the shader.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:05 +00:00
Emma Anholt 5e37cfb7fe nir: Consistently pass the shader to the shader arg of instr creation.
We were using the ralloc parent in some places, which should work out to
be the shader I think, but to de-ralloc the instrs we should just pass the
existing shader pointer in.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:05 +00:00
Emma Anholt 7a4bbe60c1 nir/from_ssa: Use nir_instr_free() to free instrs instead of ralloc.
This code was being tricky with passing a mem_ctx instead of the shader,
then freeing the mem_ctx when the pass was done and all the parallel
copies had been removed from the shader.  Use the right type for instr
creation and do a bit of manual list management to prepare the way for
non-ralloc NIR instrs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:05 +00:00
Emma Anholt b99efb8af0 nir: Pull the instr list free function out to a helper.
With the de-rallocing, we're going to have some more places that free a
list of instrs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:05 +00:00
Emma Anholt 36d9bdca0b nir: Add a nir_instr_free() to replace ralloc_free(instr).
This will gain another step shortly.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:05 +00:00
Ian Romanick 7956a701d8 nir/lower_gs_intrinsics: Make nir_lower_gs_intrinsics be idempotent
Calling this lower pass twice in a row would cause spurious
set_vertex_and_primitive_count(0, undef) intrinsics after the proper
set_vertex_and_primitive_count intrinsic.  This pretty much turns any
geometry shader into garbage.

Fix this by treating nir_intrinsic_emit_vertex_with_counter and
nir_intrinsic_end_primitive_with_counter just like the non-_with_counter
versions.  If no blocks would need set_vertex_and_primitive_count
intrinsics added, exit the pass before doing any work.  This prevents
the need for DCE to do extra clean up later.

Since this pass is potentially called multiple times via multiple
invocations of a finalize_nir callback, it is (hypothetically?) possible
that control flow could be changed to add new blocks that need this
intrinsic.  The check implemented in this commit should be robust
against that possibility.

v2: Add a_block_needs_set_vertex_and_primitive_count.  Suggested by
Timur.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>
2021-09-14 09:13:07 -07:00
Ian Romanick edf357b233 nir/lower_gs_intrinsics: Return progress if append_set_vertex_and_primitive_count makes progress
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 542d40d698 ("nir: Add new GS intrinsics that maintain a count of emitted vertices.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>
2021-09-14 09:12:47 -07:00
Bas Nieuwenhuizen b05cd10b8e nir: Avoid visiting instructions multiple times in nir_instr_free_and_dce.
Sadly need to poke a bit in the src internals to avoid using yet another
heap allocated datastructure.

Fixes: 5251548572 ("nir: Add a nir_instr_remove that recursively removes dead code.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5323
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12726>
2021-09-09 21:35:03 +00:00
Rhys Perry c1f724b2b9 nir: fix serialization of loop/if control
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: e76ae39ae2 ("nir: add support for user defined select control")
Fixes: b56451f82c ("nir: add support for user defined loop control")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12778>
2021-09-09 10:32:30 +00:00
Qiang Yu 7054c1b7fd nir/linker: pack varyings with different interpolation qualifier
Driver like radeonsi load varying in a scalar manner, so prefer to pack
varying with different interpolation qualifier into same slot to save
space.

But driver like panfrost/bifrost can load varying in vector manner,
so prefer to pack varying with same interpolation qualifier.

Driver can add interpolation qualifiers which are able to be
packed into same varying slot to pack_varying_options nir option.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>
2021-09-09 06:00:58 +00:00
Qiang Yu 5a24aed1ac nir/lower_io_to_vector: check centroid & sample when merge variable
These qualifiers should be respected for different varying load code
generation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>
2021-09-09 06:00:58 +00:00
Rob Clark b8b475ad4e nir/lower_amul: Fix usage of nir_foreach_src()
nir_foreach_src() bails after cb returns false for any src.  Which isn't
the behavior we were looking for.  Move progress flag to state struct
instead, so we don't skip visiting some sources.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>
2021-09-06 15:58:05 +00:00
Rob Clark 5800fde1bb nir/lower_amul: Handle load/store_global
These need more than 24b.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>
2021-09-06 15:58:05 +00:00
Enrico Galli 9461fe5cf1 nir: Add CAN_REORDER to load_ubo_dxil
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12707>
2021-09-03 16:21:03 +00:00
Rhys Perry 41ecef7855 nir: add sdot_2x16 and udot_2x16 opcodes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:27 +00:00
Rhys Perry ae00f5af61 nir: separate lower_add_sat
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:27 +00:00
Timur Kristóf 33630090a2 nir: Add comment to explain the sad_u8x4 opcode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12649>
2021-09-01 08:42:03 +00:00
Emma Anholt 33182c555f nir/nir_lower_uniforms_to_ubo: Set the explicit stride of the UBO 0 uniform.
Normal UBOs have explicit strides on them, make our lowered one behave the
same.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
2021-08-31 20:12:16 +00:00
Emma Anholt 01759d3fb2 nir: Set .driver_location for GLSL UBO/SSBOs when we lower to block indices.
Without this, there's no way to match the UBO nir_variable declarations to
the load_ubo intrinsics referencing their data.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
2021-08-31 20:12:16 +00:00
Timur Kristóf 548b383310 nir: Fix local_invocation_index upper bound for non-compute-like stages.
The lowered LS and NGG stages use local_invocation_index and they
can benefit from the unsigned upper bound because they can emit a
less expensive integer multiplication instruction.
This was working in the past, but accidentally borked by a refactor.

Fossil DB changes on Sienna Cichlid:

Totals from 956 (0.74% of 128647) affected shaders:
CodeSize: 2354172 -> 2344712 (-0.40%)
Instrs: 434359 -> 434327 (-0.01%)
Latency: 1883949 -> 1876814 (-0.38%)
InvThroughput: 762638 -> 757405 (-0.69%)

Fossil DB changes on Sienna Cichlid (with NGGC enabled):

Totals from 57873 (44.99% of 128647) affected shaders:
CodeSize: 155844192 -> 155607064 (-0.15%)
Instrs: 29799184 -> 29799152 (-0.00%)
Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00%
InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00%

Fixes: 8af6766062
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
2021-08-30 14:05:33 +00:00
Timur Kristóf a25fd1787a nir: Add unsigned upper bound for extract opcodes.
This helps with some cases of extract, such as:
- Emitting more optimal integer multiplications
- Better address calculation
- Possibly others

Fossil DB results on Sienna Cichlid:

Totals from 4064 (3.16% of 128647) affected shaders:
VGPRs: 262040 -> 262032 (-0.00%)
CodeSize: 28856648 -> 28811892 (-0.16%); split: -0.18%, +0.02%
Instrs: 5370279 -> 5367827 (-0.05%); split: -0.08%, +0.04%
Latency: 74230112 -> 74016671 (-0.29%); split: -0.29%, +0.01%
InvThroughput: 12082532 -> 12036365 (-0.38%); split: -0.39%, +0.01%
VClause: 108506 -> 108721 (+0.20%); split: -0.03%, +0.22%
SClause: 217731 -> 216602 (-0.52%); split: -0.67%, +0.15%
Copies: 265689 -> 270811 (+1.93%); split: -0.26%, +2.19%
PreSGPRs: 201982 -> 204907 (+1.45%); split: -0.01%, +1.46%
PreVGPRs: 236099 -> 236079 (-0.01%)

Fossil DB results on Sienna Cichlid with NGGC enabled:

Totals from 60375 (46.93% of 128647) affected shaders:
VGPRs: 2212576 -> 2212568 (-0.00%)
CodeSize: 180870420 -> 179684816 (-0.66%); split: -0.66%, +0.00%
Instrs: 34386715 -> 34213682 (-0.50%); split: -0.51%, +0.01%
Latency: 199676290 -> 198987998 (-0.34%); split: -0.35%, +0.00%
InvThroughput: 32288299 -> 31736433 (-1.71%); split: -1.71%, +0.00%
VClause: 621521 -> 621743 (+0.04%); split: -0.00%, +0.04%
SClause: 900447 -> 899392 (-0.12%); split: -0.16%, +0.04%
Copies: 3439529 -> 3445305 (+0.17%); split: -0.02%, +0.19%
PreSGPRs: 2216297 -> 2219220 (+0.13%); split: -0.00%, +0.13%
PreVGPRs: 1842887 -> 1842867 (-0.00%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
2021-08-30 14:05:33 +00:00
Caio Marcelo de Oliveira Filho 10a03e30cf nir: Allow Task/Mesh to lower compute system values
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:43 +00:00
Caio Marcelo de Oliveira Filho 4f52681a2d nir: Don't lower Task/Mesh I/O to temporaries
These won't work since a workgroup can span more than one thread, and
the temporaries are not shared memory.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:43 +00:00
Caio Marcelo de Oliveira Filho 27697d5eb8 nir/divergence_analysis: Handle Task/Mesh shaders
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho bf5f6add01 nir/lower_io: Identify Mesh output as arrayed
Mesh shader outputs are either:

- non-array builtins
- array builtins that are either per-primitive or per-vertex
- user-defined outputs that must be either per-primitive or per-vertex

So we can identify any array output as "arrayed" for the purposes of
I/O lowering.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho cd394017c8 nir: Add per-primitive I/O intrinsics
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho f95daad3a2 nir: Add a way to identify per-primitive variables
Per-primitive is similar to per-vertex attributes, but applies to all
fragments of the primitive without any interpolation involved.

Because they are regular input and outputs, keep track in shader_info
of which I/O is per-primitive so we can distinguish them after deref
lowering.  These fields can be used combined with the regular
`inputs_read`, `outputs_written` and `outputs_read`.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho 927584fa67 nir: Update documentation for location to mention Task/Mesh
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:42 +00:00
Filip Gawin 46f3582c6f nir: fix ifind_msb_rev by using appropriate type
As you can see comparion "x < 0" doesn't make
sense if x is unsigned.

Fixes: a5747f8a ("nir: add opcodes for *find_msb_rev and lowering ")

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12548>
2021-08-26 18:35:31 +00:00
Filip Gawin 9083e9a483 nir: fix shadowed variable in nir_lower_bit_size.c
Fixes: 6d79298992 ("nir/lower_bit_size: fix lowering of {imul,umul}_high")

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12527>
2021-08-26 18:04:22 +00:00
Lionel Landwerlin a13e79843e nir: prevent peephole from generating invalid NIR
We can't append instructions following a return/halt instruction
because the control flow helpers will modify the successor of the
block containing the return/halt. And the NIR validator enforces that
the return/halt must have the end of the function as successor.

This tends to happen following lower_shader_calls lowering which
inserts halts. This probably doesn't prevent the optimization, it'll
just happen in one of the return shaders after the halt has been
removed.

v2: Move prev block ending check earlier in the function (Daniel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12506>
2021-08-25 11:38:21 +00:00
Samuel Pitoiset cff106c4b6 nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(fabs(b), -a)
and fmin(-fmax(b, a)) to fmin(-fabs(b), -a).

fossils-db (Sienna Cichlid):
Totals from 34 (0.02% of 150170) affected shaders:
CodeSize: 388540 -> 387748 (-0.20%)
Instrs: 74621 -> 74423 (-0.27%)
Latency: 1039407 -> 1039011 (-0.04%)
InvThroughput: 208364 -> 208150 (-0.10%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12519>
2021-08-25 07:18:24 +02:00
Ian Romanick a6db40605e nir/algebraic: Add some extract optimizations
These help quite a bit when vectored versions of SpvOpSDotKHR and
friends are emitted as packed versions and then lowered.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Ian Romanick 839495efc6 nir/algebraic: Add lowering for dot_4x8 instructions
v2: Fix copy-and-paste bugs in lowering patterns.

v3: Add has_sudot_4x8 flag.  Requested by Rhys.

v4: Since the names of the opcodes changed from dp4 to dot_4x8, also
change the names of the lowering helpers.  Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Ian Romanick 806cd2341c nir/algebraic: Basic patterns for dot_4x8
v2: Add and modify patterns to let constant folding do better.

v3: Remove '(is_not_zero)' from the patterns that try to combine
addends.  I honestly don't know why I had it there in the first place,
and nothing in my deep git logs could help clue me in.  Noticed by
Alyssa.  Remover patterns that detect open-coded udot_4x8.  Suggested by
Alyssa and Jason.  Add missing sudot_4x8 patterns.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Ian Romanick 6c18a3b497 nir/opcodes: Add integer dot-product opcodes
Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd,
sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat.  These
represent the combinations of integer dot-product and add that operate
on packed source vectors.  That is, the four 8-bit values for each
vector is stored in a single 32-bit integer.

Some hardware may prefer to operate on unpacked byte vectors.  When such
hardware comes to Mesa, we'll have to figure out how to name things.

v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions.  These opcodes
are not 2-source commutative.

v3: Rename all opcodes to be more like some existing 4x8 opcodes.
Suggested by Timur.  Change type of packed vector sources to uint32,
change types of constant folding variables to have explicit size, and
delete some extra casts.  All suggested by Jason.

v4: Fix typo previously noticed by Alyssa but missed in v2.

v5: Add has_sudot_4x8 flag.  Requested by Rhys.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Ian Romanick 7d8bf7c167 nir/lower_bit_size: Support add_sat and sub_sat
Without this, lowered saturating ALU instructions would only clamp to
the range of the new type instead of the range of the old type.

v2: Use nir_iclamp.  Suggested by Jason. Use new
u_{int,uint}N_{min,max}() helpers.

Fixes: 090e282407 ("nir: Add a saturated unsigned integer add opcode")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Rhys Perry 3d228b6926 nir/gcm: pin some instructions which require uniform sources
fossil-db (Sienna Cichlid, GCM enabled):
Totals from 6192 (4.12% of 150170) affected shaders:
VGPRs: 548392 -> 542040 (-1.16%)
SpillSGPRs: 3702 -> 3990 (+7.78%); split: -0.54%, +8.32%
CodeSize: 62418488 -> 62481516 (+0.10%); split: -0.07%, +0.17%
MaxWaves: 70582 -> 71718 (+1.61%)
Instrs: 11768497 -> 11795079 (+0.23%); split: -0.07%, +0.30%
Latency: 445891848 -> 523561297 (+17.42%); split: -0.07%, +17.49%
InvThroughput: 115675481 -> 121494913 (+5.03%); split: -0.09%, +5.12%
VClause: 164914 -> 164934 (+0.01%); split: -0.05%, +0.06%
SClause: 405991 -> 395302 (-2.63%); split: -2.64%, +0.00%
Copies: 907216 -> 926429 (+2.12%); split: -1.11%, +3.23%
Branches: 456373 -> 457478 (+0.24%); split: -0.13%, +0.38%
PreSGPRs: 648030 -> 642953 (-0.78%); split: -0.88%, +0.10%
PreVGPRs: 522425 -> 516355 (-1.16%); split: -1.16%, +0.00%

Seems to affect Detroit: Become Human and Cyberpunk 2077. The Cyberpunk
2077 changes look like a fixed bug. At least some of the Detroit: Become
Human changes could probably be removed with better divergence analysis.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12444>
2021-08-24 16:52:31 +00:00
Rhys Perry 884ac52eaa nir: consider push constant loads as always dynamically uniform
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12444>
2021-08-24 16:52:31 +00:00
Daniel Schürmann 2cf164feb9 nir/opt_algebraic: optimize flrp(fadd, fadd, x) only if fadd are used_once
Totals from 201 (0.13% of 150170) affected shaders: (GFX10.3)
VGPRs: 13880 -> 13856 (-0.17%)
CodeSize: 1517328 -> 1518124 (+0.05%); split: -0.04%, +0.10%
MaxWaves: 3184 -> 3192 (+0.25%)
Instrs: 285487 -> 285569 (+0.03%); split: -0.06%, +0.08%
Latency: 7774066 -> 7780877 (+0.09%); split: -0.10%, +0.19%
InvThroughput: 1936341 -> 1935287 (-0.05%); split: -0.07%, +0.02%
SClause: 11446 -> 11448 (+0.02%); split: -0.01%, +0.03%
Copies: 17500 -> 17506 (+0.03%); split: -0.51%, +0.55%
Branches: 8174 -> 8180 (+0.07%); split: -0.13%, +0.21%
PreVGPRs: 12507 -> 12427 (-0.64%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>
2021-08-24 16:10:30 +00:00
Daniel Schürmann 89a842b2b6 nir/loop_analyze: consider instruction cost of nir_op_flrp
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>
2021-08-24 16:10:30 +00:00
Rhys Perry aeb1b4c30c nir/lower_io: use nir_vector_insert_imm()
This creates a single nir_op_vecn instead of a nir_op_vecn and several
copies.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12469>
2021-08-24 10:35:19 +00:00
Samuel Pitoiset f4b858e746 Revert "nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a)"
This is wrong for negative values.

This reverts commit 07cd30ca29.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12515>
2021-08-24 08:58:38 +00:00
Samuel Pitoiset 07cd30ca29 nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a)
Found with Cyberpunk 2077.

fossils-db (GFX10.3):
Totals from 128 (2.34% of 5465) affected shaders:
CodeSize: 769720 -> 767656 (-0.27%); split: -0.27%, +0.00%
Instrs: 145748 -> 145229 (-0.36%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11604>
2021-08-23 17:53:38 +00:00
Daniel Schürmann 59f2c85845 nir: return false for loops in contains_other_jump()
Allows to unwrap more loops.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12473>
2021-08-19 13:51:17 +00:00
Qiang Yu e6790d4a31 nir/inline_uniforms: support loop
Be able to inline uniforms in loop for unrolling it.
Nested loop/if is also supported.

Some example:

    for (i = 0; i < count; i++)
	...

uniform "count" will be inlined. But note this does not
make sure the loop will be unrolled (ie. count = 1000).

    for (i = 0; i < count; i++)
        for (j = init; j < 10; j++)
            if (type == 2)
                ...

uniform "count", "init" and "type" will be inlined.

It is intentional to not be too aggressive to add uniforms
to avoid false positive case while be able to support most
common usage.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Qiang Yu 3c93ebbae5 nir/loop_analyze: skip unsupported induction variable early
Instead of fail in trip count calculation, just don't mark such
kind of variable as induction from the beginning.

Don't bother inline uniform to deal with such kind of variable
either.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Qiang Yu 0b9639c35d nir/loop_analyze: record induction variables for each loop
For being used by uniform inline lowering pass.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Qiang Yu c86ec09d11 nir/loop_analyze: move nir_is_supported_terminator_condition() to header
To be shared with uniform inline.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Qiang Yu a406fff78a nir/inline_uniforms: support vector uniform
Collect per vector component dependency and lower vector uniform
load to scalar if any component need to be inlined.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Qiang Yu 9d796b21ac nir/inline_uniforms: add uniforms in condition atomically
Unless all uniforms in the condition can be inlined we can
lower the if/loop. So we rollback added uniforms when one
of uniforms in a if condition fail to be added.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Ian Romanick f0a8a9816a nir: intel/compiler: Add and use nir_op_pack_32_4x8_split
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO.  This results
in a lot of shifts and MOVs.  When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.

v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Ian Romanick 89f639c0ca nir/algebraic: Remove spurious conversions from inside logic ops
Not only does this eliminate a bunch of unnecessary type converting
MOVs, but it can also enable some SWAR.  The
dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does
something about like:

    c = a.x ^ b.x;
    d = a.y ^ b.y;
    e = a.z ^ b.z;

After this change, it looks more like:

    uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b);
    c = extract_u8(t, 0);
    d = extract_u8(t, 1);
    e = extract_u8(t, 2);

On Ice Lake, this results in:

SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Ian Romanick a147717a93 nir/algebraic: Optimize some extract forms resulting from 8-bit lowering
This eliminates some spurious, size-converting moves.  For example, on
Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:

SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends

v2: Condition two of the patterns on !options->lower_extract_byte.
Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Mike Blumenkrantz 649251ad4e nir/lower_vectorize_tess_levels: set num_components for vectorized loads
this otherwise explodes when rewriting e.g., a single array component load to a vec4

Fixes: f5adf27fb9 ("nir,radv: add and use nir_vectorize_tess_levels()")

fixes zmike/mesa#94

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12419>
2021-08-18 12:18:15 +00:00
Timothy Arceri edfcc4f022 nir: fix GCM when GVN enabled
Enabling GVN uncovered a bug where we would crash if the pass
thinking about pushing something into a loop.

Fixes: 6538b3e566 ("nir: add heuristic for instructions in loops with GCM")

Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12242>
2021-08-17 03:15:49 +00:00
Rhys Perry cfc4433015 nir,glsl_to_nir: use nir_fdot()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry 28acc4120f nir: lower fdot to ffma if lower_ffma=false
fossil-db (GFX10.3):
Totals from 57689 (39.44% of 146267) affected shaders:
VGPRs: 2873712 -> 2873432 (-0.01%); split: -0.01%, +0.00%
CodeSize: 227661100 -> 227583572 (-0.03%); split: -0.08%, +0.04%
MaxWaves: 1289562 -> 1289598 (+0.00%); split: +0.01%, -0.00%
Instrs: 43115433 -> 43083308 (-0.07%); split: -0.12%, +0.05%
Latency: 869947191 -> 870279826 (+0.04%); split: -0.06%, +0.10%
InvThroughput: 199425811 -> 199434448 (+0.00%); split: -0.04%, +0.05%

fossil-db (GFX10):
Totals from 2 (0.00% of 146267) affected shaders:
Latency: 8123 -> 8107 (-0.20%)

fossil-db (GFX9):
Totals from 2 (0.00% of 146401) affected shaders:
(no stat changes)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry 174a4f36f9 nir: create ffma from builders more often
We will not be able to combine instructions into ffma later if they are
exact, so create them from the start. They can be lowered later if they
are unwanted.

fossil-db (GFX10.3):
Totals from 16589 (11.34% of 146267) affected shaders:
VGPRs: 938872 -> 938704 (-0.02%)
SpillSGPRs: 11334 -> 10785 (-4.84%)
CodeSize: 96551964 -> 96498040 (-0.06%); split: -0.08%, +0.02%
MaxWaves: 338760 -> 338772 (+0.00%)
Instrs: 18356857 -> 18350486 (-0.03%); split: -0.06%, +0.02%
Latency: 561563310 -> 561414360 (-0.03%); split: -0.08%, +0.05%
InvThroughput: 145629673 -> 145594740 (-0.02%); split: -0.04%, +0.01%

fossil-db (GFX10):
Totals from 16252 (11.11% of 146267) affected shaders:
VGPRs: 893820 -> 893744 (-0.01%)
SpillSGPRs: 11334 -> 10785 (-4.84%)
CodeSize: 95890244 -> 95839124 (-0.05%); split: -0.08%, +0.02%
MaxWaves: 367704 -> 367734 (+0.01%)
Instrs: 18199741 -> 18194437 (-0.03%); split: -0.06%, +0.03%
Latency: 560912971 -> 560854179 (-0.01%); split: -0.07%, +0.06%
InvThroughput: 142899814 -> 142877939 (-0.02%); split: -0.03%, +0.02%

fossil-db (GFX9):
Totals from 16287 (11.12% of 146401) affected shaders:
SGPRs: 1312784 -> 1312768 (-0.00%); split: -0.05%, +0.05%
VGPRs: 931440 -> 931444 (+0.00%); split: -0.00%, +0.00%
SpillSGPRs: 14623 -> 14597 (-0.18%)
CodeSize: 94428788 -> 94344404 (-0.09%); split: -0.10%, +0.01%
MaxWaves: 90105 -> 90109 (+0.00%)
Instrs: 18486905 -> 18473434 (-0.07%); split: -0.08%, +0.01%
Latency: 720947295 -> 720818323 (-0.02%); split: -0.07%, +0.05%
InvThroughput: 365240104 -> 365224659 (-0.00%); split: -0.02%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry ed70b256ce nir: add ffma creation helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry 4ec4d862c2 nir/algebraic: add is_used_once to dot product reassociation optimization
This improves register usage.

fossil-db (Sienna Cichlid, on top of !9805):
Totals from 4317 (2.88% of 149839) affected shaders:
VGPRs: 352592 -> 351704 (-0.25%); split: -1.48%, +1.23%
SpillSGPRs: 182 -> 248 (+36.26%)
CodeSize: 31601192 -> 31587624 (-0.04%); split: -0.09%, +0.04%
MaxWaves: 56964 -> 57298 (+0.59%); split: +2.48%, -1.90%
Instrs: 5973557 -> 5974122 (+0.01%); split: -0.05%, +0.06%
Latency: 72088175 -> 72253033 (+0.23%); split: -0.36%, +0.59%
InvThroughput: 14978160 -> 14798919 (-1.20%); split: -1.29%, +0.09%
VClause: 100994 -> 98645 (-2.33%); split: -3.05%, +0.73%
SClause: 278206 -> 276820 (-0.50%); split: -0.54%, +0.04%
Copies: 200264 -> 199556 (-0.35%); split: -1.17%, +0.82%
Branches: 86410 -> 85930 (-0.56%); split: -0.56%, +0.01%
PreSGPRs: 207355 -> 207759 (+0.19%); split: -0.00%, +0.20%
PreVGPRs: 314646 -> 310911 (-1.19%); split: -1.35%, +0.17%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry f95a16be72 nir/algebraic: reassociate add chains for more MAD/FMA-friendly code
fossil-db (GFX10.3):
Totals from 25866 (17.68% of 146267) affected shaders:
VGPRs: 1625456 -> 1644936 (+1.20%); split: -0.05%, +1.24%
SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03%
CodeSize: 161604460 -> 161458052 (-0.09%); split: -0.11%, +0.02%
MaxWaves: 454842 -> 452160 (-0.59%); split: +0.04%, -0.63%
Instrs: 30652596 -> 30456446 (-0.64%); split: -0.65%, +0.01%
Latency: 723098749 -> 722084247 (-0.14%); split: -0.21%, +0.07%
InvThroughput: 166023468 -> 165506875 (-0.31%); split: -0.36%, +0.05%

fossil-db (GFX10):
Totals from 25866 (17.68% of 146267) affected shaders:
VGPRs: 1593576 -> 1611976 (+1.15%); split: -0.09%, +1.25%
SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03%
CodeSize: 162294468 -> 162154456 (-0.09%); split: -0.11%, +0.02%
MaxWaves: 477448 -> 474166 (-0.69%); split: +0.10%, -0.79%
Instrs: 30820164 -> 30625805 (-0.63%); split: -0.65%, +0.02%
Latency: 723190249 -> 722273445 (-0.13%); split: -0.20%, +0.08%
InvThroughput: 163114872 -> 162582966 (-0.33%); split: -0.37%, +0.04%

fossil-db (GFX9):
Totals from 25866 (17.67% of 146401) affected shaders:
SGPRs: 2167808 -> 2169920 (+0.10%); split: -0.09%, +0.19%
VGPRs: 1649404 -> 1667592 (+1.10%); split: -0.43%, +1.53%
CodeSize: 161273556 -> 161281996 (+0.01%); split: -0.07%, +0.08%
MaxWaves: 114910 -> 113519 (-1.21%); split: +0.10%, -1.31%
Instrs: 31557180 -> 31403708 (-0.49%); split: -0.50%, +0.02%
Latency: 899594793 -> 898786283 (-0.09%); split: -0.19%, +0.10%
InvThroughput: 412265691 -> 411551698 (-0.17%); split: -0.28%, +0.11%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry 110bcb4919 nir/algebraic: add various ffma optimizations
fossil-db (GFX10.3):
Totals from 7532 (5.15% of 146267) affected shaders:
VGPRs: 414696 -> 414304 (-0.09%); split: -0.18%, +0.08%
CodeSize: 33393444 -> 33375908 (-0.05%); split: -0.13%, +0.08%
MaxWaves: 149854 -> 150094 (+0.16%); split: +0.27%, -0.11%
Instrs: 6279823 -> 6271364 (-0.13%); split: -0.18%, +0.05%
Latency: 60308898 -> 60296025 (-0.02%); split: -0.13%, +0.11%
InvThroughput: 13770542 -> 13745192 (-0.18%); split: -0.24%, +0.06%

fossil-db (GFX10):
Totals from 7532 (5.15% of 146267) affected shaders:
VGPRs: 406664 -> 405564 (-0.27%); split: -0.39%, +0.12%
CodeSize: 33544656 -> 33527568 (-0.05%); split: -0.13%, +0.08%
MaxWaves: 158584 -> 158858 (+0.17%); split: +0.30%, -0.13%
Instrs: 6316242 -> 6307913 (-0.13%); split: -0.18%, +0.05%
Latency: 60243290 -> 60232844 (-0.02%); split: -0.13%, +0.11%
InvThroughput: 13643345 -> 13620171 (-0.17%); split: -0.24%, +0.07%

fossil-db (GFX9):
Totals from 7543 (5.15% of 146401) affected shaders:
SGPRs: 546384 -> 547472 (+0.20%); split: -0.08%, +0.28%
VGPRs: 412636 -> 411896 (-0.18%); split: -0.27%, +0.09%
CodeSize: 33216196 -> 33210564 (-0.02%); split: -0.12%, +0.11%
MaxWaves: 38771 -> 38789 (+0.05%); split: +0.17%, -0.12%
Instrs: 6419878 -> 6414891 (-0.08%); split: -0.18%, +0.11%
Latency: 70972327 -> 70922754 (-0.07%); split: -0.15%, +0.08%
InvThroughput: 33949039 -> 33909258 (-0.12%); split: -0.20%, +0.08%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry 82d0600ba2 nir: swap fadd operands in nir_atan()
This shouldn't do anything but will make testing a later patch easier.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:44 +00:00
Eric Engestrom 4d9acfa533 python: drop explicit output_encoding='utf-8' in mako templates
Python 3 handles unicode strings by default, so we can drop all that.

Suggested-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>
2021-08-14 21:44:32 +00:00
Eric Engestrom 93cb3aca03 Revert "python: Explicitly add the 'L' suffix on Python 3"
This reverts commit ad363913e6.

This code was added to be able to compare the output file while porting
the script from python2 to python3, but this has long been finished and
the extra complexity is not needed anymore.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>
2021-08-14 21:44:32 +00:00
Eric Engestrom f1eae2f8bb python: drop python2 support
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>
2021-08-14 21:44:32 +00:00
Caio Marcelo de Oliveira Filho 0092edfec0 nir/dead_cf: Do not remove loops with loads that can't be reordered
If a loop is followed by a barrier, the ordering between a load inside
the loop and other memory operations after the barrier may have to be
preserved depending on the type of memory involved.  This is relevant
when the memory is writeable by other invocations.  In such case, it
is not valid to completely eliminate the loop.

This commit doesn't attempt to precisely catch the barrier case, as
analysis could become too complex.  It simply assumes it can't drop
the loops that contain certain types of loads unless those are known
to be safe to reorder (via the access flag).

Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4475
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9938>
2021-08-14 01:48:03 +00:00
Bas Nieuwenhuizen aa8179e33f nir/inline_functions: Handle halting functions.
Without this stitch_blocks complains about ending in a jump with a
non-empty block after the inserted body.

I hit this with CTS raytracing tests where we tried to inline a
function that basically ended up being something like

{
   ignore_ray_intersection
   halt
}

I kept the nop path when possible as that does not leave a mess
for the optimization loop to optimize.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12163>
2021-08-13 21:18:13 +00:00
Bas Nieuwenhuizen fa6cd6e00d nir/lower_scratch: Ensure we don't lower vars with unsupported usage.
Need to avoid lowering temps when they are used by other instructions,
like the rt instructions (some of the shader call parameters get
converted to temp variables and we will lower them later with
the explicit io lowering pass as we need to guarantee they will
end up in scratch).

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12162>
2021-08-13 20:56:30 +00:00
Rhys Perry 04bd2a1245 nir: remove src/compiler/nir/nir_control_flow
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12357>
2021-08-13 17:51:42 +01:00
Emma Anholt 673cc9323a nir: Move phi src setup to a helper.
Cleans up the ralloc/list push code all over the tree.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11772>
2021-08-13 16:11:57 +00:00
Vinson Lee 8d679f4f4e nir: Initialize evaluate_cube_face_index_amd dst.x.
Fix defect reported by Coverity Scan.

Uninitialized scalar variable (UNINIT)
uninit_use: Using uninitialized value dst.x.

Fixes: a1a2a8dfda ("nir: add AMD_gcn_shader extended instructions")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12290>
2021-08-12 23:13:52 -07:00
Lionel Landwerlin 01b0935d31 nir/lower_shader_calls: remove empty phis
This is confusing opt_cse.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8dfb240b1f ("nir: Add raytracing shader call lowering pass.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11953>
2021-08-11 15:10:07 +03:00
Marcin Ślusarz e1b325f587 nir/builder: invalidate metadata per function
Fixes: a62098fff2 ("nir: Add a helper for general instruction-modifying passes.")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12324>
2021-08-11 11:23:30 +00:00
Pierre-Eric Pelloux-Prayer 7684d57a05 nir: add a pass to optimize "gl_FragDepth = gl_FragCoord.z" away
gl_FragDepth default value is gl_FragCoord.z so if a shader does:

   gl_FragDepth = gl_FragCoord.z

we can drop this assignment.

v2: use nir_ssa_scalar_resolved and don't do this is gl_FragDepth
    is wrote multiple times (Jason)
v3: - move to its own pass (Jason)
    - handle var = NULL (Rhys)
v4: refactoring (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10697>
2021-08-11 11:00:11 +02:00
Ian Romanick 84d2e53789 Revert "nir/algebraic: Convert some f2u to f2i"
Per https://gitlab.freedesktop.org/mesa/mesa/-/issues/5178#note_1019666,
the assumption fundamental to this optimization is false.  Section
2.4.1 (Float to Integer) of Ivy Bridge PRMs describes the situation.
The wording of the section is somewhat confusing (because it doesn't
clearly delineate between signed and unsigned integers), but the last
two rows of the table make it clear that F->UD conversion clamps
negative float values to 0.

All other hardware mentioned in that thread seems to behave the same
way.

The real problem is that, with hardware that behaves in this ways,
converting f2u(2147483648.0) to f2i(2147483648.0) changes the bit pattern
that would be produced from 0x80000000 to 0x7fffffff.

This reverts commit ad05920258.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>
2021-08-10 22:16:13 +00:00
Ian Romanick 3ba66ebbc8 nir/opcodes: Use u_intN_(min|max)
uadd_sat was updated using sed, so I didn't even notice the surrounding
opcodes.  Oops.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>
2021-08-10 22:16:13 +00:00
Alyssa Rosenzweig 9b57a81815 nir/lower_mediump: Fix metadata in all passes
Fixes: fb29cef8dd ("nir: add many passes that lower and optimize 16-bit input/outputs and samplers")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11732>
2021-08-10 20:55:33 +00:00
Alyssa Rosenzweig 03c18f7efc nir/lower_mediump_io: Don't remap base unless needed
Otherwise drivers that don't use 16-bit slots for varyings will get
confused and have their driver_locations scribbled over. This has caused
multiple problems for both Panfrost and Asahi this week. Given the only
other user of the pass for varyings is radeonsi, which needs both
together, I think this is the least controversial fix.

Fixes: fb29cef8dd ("nir: add many passes that lower and optimize 16-bit input/outputs and samplers")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11732>
2021-08-10 20:55:33 +00:00
Mike Blumenkrantz ec66c58138 nir: add imm_vec3 to round these out
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12253>
2021-08-09 14:45:30 +00:00
Rhys Perry d764de6460 nir/tests: add tests for umod/imod/irem optimizations
Both nir_opt_algebraic and nir_opt_idiv_const have optimizations for
umod/imod/irem by constants.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry e008eb1224 nir: fix signed overflow for iadd constant folding
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry b627b9fcec nir/idiv_const: optimize imod/irem
fossil-db changes (Sienna Cichlid):
Totals from 223 (0.15% of 150170) affected shaders:
CodeSize: 384564 -> 370824 (-3.57%)
Instrs: 74518 -> 71961 (-3.43%)
Latency: 351620 -> 344640 (-1.99%)
InvThroughput: 80122 -> 74846 (-6.58%)
VClause: 919 -> 920 (+0.11%)
SClause: 2879 -> 2877 (-0.07%); split: -0.10%, +0.03%
Copies: 3099 -> 3103 (+0.13%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry 96168301f9 nir/idiv_const: improve idiv(n, INT_MIN)
This lowering is smaller and -INT64_MIN is probably UB (signed overflow).

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry 4e2b94331b nir/algebraic: improve irem by power-of-two optimization
Requires one less instruction.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry 2bb49e4587 nir/search: don't consider INT_MIN a negative power-of-two
ineg(INT_MIN)/iabs(INT_MIN) won't work as expected.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry b009467b81 nir/algebraic: add optimizations for imul(a, INT_MIN)
is_pos_power_of_two would catch this, but nir_op_imul has signed sources,
so is_neg_power_of_two catches it instead, which creates a useless
nir_op_ineg.

fossil-db (Sienna Cichlid):
Totals from 1014 (0.68% of 150170) affected shaders:
CodeSize: 3592296 -> 3592288 (-0.00%); split: -0.00%, +0.00%
Instrs: 671211 -> 670426 (-0.12%)
Latency: 5268917 -> 5268479 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 2187349 -> 2187343 (-0.00%); split: -0.00%, +0.00%
VClause: 8634 -> 8636 (+0.02%)
Copies: 97585 -> 97604 (+0.02%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry 65cd5a0f22 nir/algebraic: don't optimize umod/imod/irem if lower_bitops=true
Match the udiv/idiv/imul by power-of-two optimizations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry ec4b425f59 nir/algebraic: fix imod by negative power-of-two
If "a" is a multiple of "b", then the result would have been "b" instead
of 0.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0ef5f3552f ("nir: add strength reduction pattern for imod/irem with pow2 divisor.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Dave Airlie ad92c2b253 nir: add fisnormal lowering
just lower the 32-bit version for now.

Thanks to alyssa for this suggested lowering.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12207>
2021-08-06 14:27:48 +10:00
Dave Airlie 330e28155f nir: add 32-bit bool of fisfinite
Add the bool lowering as well.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12207>
2021-08-06 12:06:21 +10:00
Connor Abbott 8115cde3ba tu, freedreno/a6xx, ir3: Rewrite tess PrimID handling
The previous handling conflated RelPatchID and PrimID, which would
result in incorrect gl_PrimitiveID when doing draw splitting and didn't
work with PrimID passthrough which fills the VPC slot with the "correct"
PrimID value from the tess factor BO which we left 0. Replace PrimID in
the tess lowering pass with a new RelPatchID sysval, and relace PrimID
with RelPatchID in the VS input code in turnip/freedreno at the same
time so that there is no net change in the tess lowering code. However,
now we have to add new mechanisms for getting the user-level PrimID:

- In the TCS it comes from the VS, just like gl_PrimitiveIDIn in the GS.
  This means we have to add another register to our VS->TCS ABI. I
  decided to put PrimID in r0.z, after the TCS header and RelPatchID,
  because it might not be read in the TCS.
- If any stage after the TCS uses PrimID, the TCS stores it in the first
  dword of the tess factor BO, and it is read by the fixed-function
  tessellator and accessed in the TES via the newly-uncovered DSPRIMID
  field. If we have tess and GS, the TES passes this value through to
  the GS in the same way as the VS does. PrimID passthrough for reading
  it in the FS when there's tess but no GS also "just works" once we
  start storing it in the TCS. In particular this fixes
  dEQP-VK.pipeline.misc.primitive_id_from_tess which tests exactly that.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
2021-08-05 16:35:41 +00:00
Jason Ekstrand 0ddac113f8 nir: Removing uses of SSA defs destroys SSA liveness
The liveness information will be a superset of real liveness so it's
unlikely something will explode if it tries to use it.  However, it is
out-of-date and should be re-run if someone really wants it.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12186>
2021-08-03 21:36:53 +00:00
Ian Romanick 72259a870f util: Add and use functions to calculate min and max int for a size
Many places need to know the maximum or minimum possible value for a
given size integer... so everyone just open-codes their favorite
version.  There is some potential to hit either undefined or
implementation-defined behavior, so having one version that Just Works
seems beneficial.

v2: Fix copy-and-pasted bug (INT64_MAX instead of INT64_MIN) in
u_intmin.  Noticed by CI.  Lol.  Rename functions
`s/u_(uint|int)(min|max)/u_\1N_\2/g`.  Suggested by Jason.  Add some
unit tests that would have caught the copy-and-paste bug before wasting
CI time.  Change the implementation of u_intN_min to use the same
pattern as stdint.h.  This avoids the integer division.  Noticed by
Jason.

v3: Add changes to convert_clear_color
(src/gallium/drivers/iris/iris_clear.c).  Suggested by Nanley.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12177>
2021-08-03 12:55:02 -07:00
Timothy Arceri 6538b3e566 nir: add heuristic for instructions in loops with GCM
Moving instructions out of large loops tends to cause excessive
spilling. This appears to be a good limit.

In future it might make sense to make this a NIR options so
other drivers can set their own limits.

Tiger Lake
total instructions in shared programs: 20930180 -> 20926952 (-0.02%)
instructions in affected programs: 280768 -> 277540 (-1.15%)
helped: 734
HURT: 192
helped stats (abs) min: 1 max: 61 x̄: 5.16 x̃: 4
helped stats (rel) min: 0.04% max: 10.64% x̄: 3.23% x̃: 3.14%
HURT stats (abs)   min: 1 max: 52 x̄: 2.90 x̃: 1
HURT stats (rel)   min: 0.03% max: 9.76% x̄: 1.13% x̃: 0.61%
95% mean confidence interval for instructions value: -3.89 -3.08
95% mean confidence interval for instructions %-change: -2.49% -2.16%
Instructions are helped.

total cycles in shared programs: 841825217 -> 838817552 (-0.36%)
cycles in affected programs: 122088078 -> 119080413 (-2.46%)
helped: 941
HURT: 100
helped stats (abs) min: 1 max: 160080 x̄: 3274.31 x̃: 2660
helped stats (rel) min: <.01% max: 41.64% x̄: 5.50% x̃: 4.80%
HURT stats (abs)   min: 1 max: 41856 x̄: 734.62 x̃: 26
HURT stats (rel)   min: <.01% max: 7.29% x̄: 0.44% x̃: 0.27%
95% mean confidence interval for cycles value: -3236.56 -2541.85
95% mean confidence interval for cycles %-change: -5.26% -4.60%
Cycles are helped.

total sends in shared programs: 977905 -> 977782 (-0.01%)
sends in affected programs: 2279 -> 2156 (-5.40%)
helped: 119
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.03 x̃: 1
helped stats (rel) min: 0.60% max: 14.29% x̄: 6.93% x̃: 6.67%
95% mean confidence interval for sends value: -1.09 -0.98
95% mean confidence interval for sends %-change: -7.42% -6.45%
Sends are helped.

LOST:   2
GAINED: 0

Ice Lake
total instructions in shared programs: 19865361 -> 19861747 (-0.02%)
instructions in affected programs: 185789 -> 182175 (-1.95%)
helped: 593
HURT: 47
helped stats (abs) min: 1 max: 27 x̄: 6.17 x̃: 4
helped stats (rel) min: 0.19% max: 8.65% x̄: 4.53% x̃: 4.60%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.03% max: 0.23% x̄: 0.11% x̃: 0.04%
95% mean confidence interval for instructions value: -5.93 -5.37
95% mean confidence interval for instructions %-change: -4.32% -4.06%
Instructions are helped.

total loops in shared programs: 6120 -> 6117 (-0.05%)
loops in affected programs: 6 -> 3 (-50.00%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%

total cycles in shared programs: 961777176 -> 959404350 (-0.25%)
cycles in affected programs: 172224180 -> 169851354 (-1.38%)
helped: 936
HURT: 80
helped stats (abs) min: 1 max: 9566 x̄: 2621.08 x̃: 2550
helped stats (rel) min: <.01% max: 41.77% x̄: 4.22% x̃: 3.84%
HURT stats (abs)   min: 1 max: 59146 x̄: 1006.34 x̃: 24
HURT stats (rel)   min: <.01% max: 3.78% x̄: 0.44% x̃: 0.25%
95% mean confidence interval for cycles value: -2513.72 -2157.20
95% mean confidence interval for cycles %-change: -4.13% -3.57%
Cycles are helped.

total sends in shared programs: 1019995 -> 1019872 (-0.01%)
sends in affected programs: 2283 -> 2160 (-5.39%)
helped: 119
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.03 x̃: 1
helped stats (rel) min: 0.60% max: 14.29% x̄: 6.91% x̃: 6.67%
95% mean confidence interval for sends value: -1.09 -0.98
95% mean confidence interval for sends %-change: -7.39% -6.42%
Sends are helped.

LOST:   4
GAINED: 0

Skylake
total instructions in shared programs: 17994337 -> 17993846 (<.01%)
instructions in affected programs: 146294 -> 145803 (-0.34%)
helped: 190
HURT: 47
helped stats (abs) min: 1 max: 12 x̄: 2.83 x̃: 3
helped stats (rel) min: 0.14% max: 4.29% x̄: 1.08% x̃: 0.90%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.03% max: 0.22% x̄: 0.11% x̃: 0.04%
95% mean confidence interval for instructions value: -2.30 -1.84
95% mean confidence interval for instructions %-change: -0.95% -0.74%
Instructions are helped.

total loops in shared programs: 6029 -> 6023 (-0.10%)
loops in affected programs: 12 -> 6 (-50.00%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -50.00% -50.00%
Loops are helped.

total cycles in shared programs: 939062940 -> 938023548 (-0.11%)
cycles in affected programs: 169671482 -> 168632090 (-0.61%)
helped: 980
HURT: 134
helped stats (abs) min: 1 max: 25000 x̄: 1075.57 x̃: 1052
helped stats (rel) min: <.01% max: 42.75% x̄: 2.51% x̃: 1.32%
HURT stats (abs)   min: 1 max: 837 x̄: 109.45 x̃: 20
HURT stats (rel)   min: <.01% max: 5.71% x̄: 0.73% x̃: 0.21%
95% mean confidence interval for cycles value: -1005.89 -860.17
95% mean confidence interval for cycles %-change: -2.39% -1.84%
Cycles are helped.

total sends in shared programs: 1026848 -> 1026724 (-0.01%)
sends in affected programs: 2302 -> 2178 (-5.39%)
helped: 120
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.03 x̃: 1
helped stats (rel) min: 0.60% max: 14.29% x̄: 6.91% x̃: 6.67%
95% mean confidence interval for sends value: -1.09 -0.98
95% mean confidence interval for sends %-change: -7.40% -6.43%
Sends are helped.

LOST:   1
GAINED: 1

Broadwell
total instructions in shared programs: 17605621 -> 17605154 (<.01%)
instructions in affected programs: 145691 -> 145224 (-0.32%)
helped: 184
HURT: 48
helped stats (abs) min: 1 max: 12 x̄: 2.83 x̃: 3
helped stats (rel) min: 0.13% max: 4.29% x̄: 1.09% x̃: 0.93%
HURT stats (abs)   min: 1 max: 7 x̄: 1.12 x̃: 1
HURT stats (rel)   min: 0.03% max: 0.48% x̄: 0.12% x̃: 0.04%
95% mean confidence interval for instructions value: -2.26 -1.77
95% mean confidence interval for instructions %-change: -0.95% -0.73%
Instructions are helped.

total loops in shared programs: 5968 -> 5963 (-0.08%)
loops in affected programs: 10 -> 5 (-50.00%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -50.00% -50.00%
Loops are helped.

total cycles in shared programs: 1000679489 -> 998592756 (-0.21%)
cycles in affected programs: 173421234 -> 171334501 (-1.20%)
helped: 993
HURT: 153
helped stats (abs) min: 1 max: 766608 x̄: 2118.49 x̃: 1080
helped stats (rel) min: <.01% max: 54.61% x̄: 2.61% x̃: 1.73%
HURT stats (abs)   min: 1 max: 2200 x̄: 110.61 x̃: 11
HURT stats (rel)   min: <.01% max: 5.68% x̄: 0.63% x̃: 0.06%
95% mean confidence interval for cycles value: -3191.23 -450.54
95% mean confidence interval for cycles %-change: -2.47% -1.89%
Cycles are helped.

total sends in shared programs: 996341 -> 996222 (-0.01%)
sends in affected programs: 2151 -> 2032 (-5.53%)
helped: 115
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.03 x̃: 1
helped stats (rel) min: 0.60% max: 14.29% x̄: 7.07% x̃: 6.67%
95% mean confidence interval for sends value: -1.09 -0.98
95% mean confidence interval for sends %-change: -7.55% -6.58%
Sends are helped.

Haswell
total instructions in shared programs: 16038375 -> 16038121 (<.01%)
instructions in affected programs: 216797 -> 216543 (-0.12%)
helped: 185
HURT: 217
helped stats (abs) min: 1 max: 12 x̄: 2.84 x̃: 3
helped stats (rel) min: 0.13% max: 4.23% x̄: 1.30% x̃: 1.20%
HURT stats (abs)   min: 1 max: 6 x̄: 1.25 x̃: 1
HURT stats (rel)   min: 0.03% max: 5.66% x̄: 0.61% x̃: 0.40%
95% mean confidence interval for instructions value: -0.85 -0.41
95% mean confidence interval for instructions %-change: -0.40% -0.14%
Instructions are helped.

total loops in shared programs: 5947 -> 5942 (-0.08%)
loops in affected programs: 10 -> 5 (-50.00%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -50.00% -50.00%
Loops are helped.

total cycles in shared programs: 967655093 -> 965746713 (-0.20%)
cycles in affected programs: 197288924 -> 195380544 (-0.97%)
helped: 950
HURT: 195
helped stats (abs) min: 1 max: 782820 x̄: 2274.79 x̃: 1260
helped stats (rel) min: <.01% max: 54.26% x̄: 3.02% x̃: 1.71%
HURT stats (abs)   min: 1 max: 15790 x̄: 1295.73 x̃: 21
HURT stats (rel)   min: <.01% max: 119.85% x̄: 7.76% x̃: 0.11%
95% mean confidence interval for cycles value: -3014.22 -319.19
95% mean confidence interval for cycles %-change: -1.83% -0.55%
Cycles are helped.

total sends in shared programs: 934894 -> 934765 (-0.01%)
sends in affected programs: 2192 -> 2063 (-5.89%)
helped: 115
HURT: 2
helped stats (abs) min: 1 max: 4 x̄: 1.14 x̃: 1
helped stats (rel) min: 0.60% max: 28.57% x̄: 7.68% x̃: 6.67%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67%
95% mean confidence interval for sends value: -1.23 -0.98
95% mean confidence interval for sends %-change: -8.28% -6.24%
Sends are helped.

LOST:   1
GAINED: 18

Ivy Bridge
total instructions in shared programs: 15269357 -> 15269398 (<.01%)
instructions in affected programs: 190484 -> 190525 (0.02%)
helped: 77
HURT: 206
helped stats (abs) min: 1 max: 6 x̄: 2.47 x̃: 3
helped stats (rel) min: 0.14% max: 5.31% x̄: 1.46% x̃: 1.65%
HURT stats (abs)   min: 1 max: 3 x̄: 1.12 x̃: 1
HURT stats (rel)   min: 0.03% max: 2.38% x̄: 0.42% x̃: 0.40%
95% mean confidence interval for instructions value: -0.06 0.35
95% mean confidence interval for instructions %-change: -0.21% 0.03%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 4001 -> 3996 (-0.12%)
loops in affected programs: 10 -> 5 (-50.00%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -50.00% -50.00%
Loops are helped.

total cycles in shared programs: 562045564 -> 561063543 (-0.17%)
cycles in affected programs: 200924872 -> 199942851 (-0.49%)
helped: 748
HURT: 160
helped stats (abs) min: 2 max: 14926 x̄: 1692.94 x̃: 1620
helped stats (rel) min: <.01% max: 53.29% x̄: 3.17% x̃: 1.87%
HURT stats (abs)   min: 2 max: 15726 x̄: 1776.86 x̃: 36
HURT stats (rel)   min: <.01% max: 114.43% x̄: 10.66% x̃: 0.21%
95% mean confidence interval for cycles value: -1237.33 -925.71
95% mean confidence interval for cycles %-change: -1.54% 0.08%
Inconclusive result (%-change mean confidence interval includes 0).

total sends in shared programs: 893348 -> 893330 (<.01%)
sends in affected programs: 187 -> 169 (-9.63%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 4.08% max: 22.22% x̄: 11.70% x̃: 10.10%
95% mean confidence interval for sends value: -1.56 -1.02
95% mean confidence interval for sends %-change: -14.92% -8.48%
Sends are helped.

LOST:   1
GAINED: 19

Sandy Bridge
total instructions in shared programs: 11785227 -> 11785774 (<.01%)
instructions in affected programs: 78403 -> 78950 (0.70%)
helped: 65
HURT: 505
helped stats (abs) min: 1 max: 4 x̄: 2.22 x̃: 3
helped stats (rel) min: 0.14% max: 4.17% x̄: 1.19% x̃: 1.38%
HURT stats (abs)   min: 1 max: 5 x̄: 1.37 x̃: 1
HURT stats (rel)   min: 0.24% max: 3.33% x̄: 1.57% x̃: 1.72%
95% mean confidence interval for instructions value: 0.85 1.07
95% mean confidence interval for instructions %-change: 1.16% 1.36%
Instructions are HURT.

total loops in shared programs: 2441 -> 2437 (-0.16%)
loops in affected programs: 8 -> 4 (-50.00%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -50.00% -50.00%
Loops are helped.

total cycles in shared programs: 497178796 -> 496669298 (-0.10%)
cycles in affected programs: 51483322 -> 50973824 (-0.99%)
helped: 476
HURT: 137
helped stats (abs) min: 2 max: 7502 x̄: 1079.36 x̃: 1260
helped stats (rel) min: <.01% max: 42.50% x̄: 2.31% x̃: 0.86%
HURT stats (abs)   min: 2 max: 754 x̄: 31.23 x̃: 18
HURT stats (rel)   min: <.01% max: 3.01% x̄: 0.09% x̃: 0.02%
95% mean confidence interval for cycles value: -901.99 -760.32
95% mean confidence interval for cycles %-change: -2.20% -1.36%
Cycles are helped.

total sends in shared programs: 642919 -> 642915 (<.01%)
sends in affected programs: 32 -> 28 (-12.50%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 11.11% max: 14.29% x̄: 12.70% x̃: 12.70%
95% mean confidence interval for sends value: -1.00 -1.00
95% mean confidence interval for sends %-change: -15.61% -9.78%
Sends are helped.

Iron Lake
total instructions in shared programs: 8180061 -> 8180248 (<.01%)
instructions in affected programs: 65004 -> 65191 (0.29%)
helped: 59
HURT: 253
helped stats (abs) min: 1 max: 4 x̄: 2.24 x̃: 3
helped stats (rel) min: 0.16% max: 2.23% x̄: 1.04% x̃: 1.29%
HURT stats (abs)   min: 1 max: 5 x̄: 1.26 x̃: 1
HURT stats (rel)   min: 0.21% max: 3.85% x̄: 0.93% x̃: 0.60%
95% mean confidence interval for instructions value: 0.43 0.77
95% mean confidence interval for instructions %-change: 0.45% 0.68%
Instructions are HURT.

total loops in shared programs: 863 -> 861 (-0.23%)
loops in affected programs: 4 -> 2 (-50.00%)
helped: 2
HURT: 0

total cycles in shared programs: 239357490 -> 238907668 (-0.19%)
cycles in affected programs: 17314006 -> 16864184 (-2.60%)
helped: 176
HURT: 34
helped stats (abs) min: 4 max: 13400 x̄: 2558.05 x̃: 2920
helped stats (rel) min: 0.01% max: 35.58% x̄: 3.76% x̃: 2.69%
HURT stats (abs)   min: 2 max: 14 x̄: 11.59 x̃: 14
HURT stats (rel)   min: <.01% max: 0.06% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -2440.68 -1843.34
95% mean confidence interval for cycles %-change: -3.78% -2.51%
Cycles are helped.

GM45
total instructions in shared programs: 4985293 -> 4985401 (<.01%)
instructions in affected programs: 58807 -> 58915 (0.18%)
helped: 57
HURT: 202
helped stats (abs) min: 1 max: 4 x̄: 2.26 x̃: 3
helped stats (rel) min: 0.15% max: 2.23% x̄: 1.06% x̃: 1.29%
HURT stats (abs)   min: 1 max: 5 x̄: 1.17 x̃: 1
HURT stats (rel)   min: 0.21% max: 3.85% x̄: 0.76% x̃: 0.48%
95% mean confidence interval for instructions value: 0.22 0.61
95% mean confidence interval for instructions %-change: 0.24% 0.48%
Instructions are HURT.

total loops in shared programs: 639 -> 638 (-0.16%)
loops in affected programs: 2 -> 1 (-50.00%)
helped: 1
HURT: 0

total cycles in shared programs: 153794236 -> 153546274 (-0.16%)
cycles in affected programs: 9947778 -> 9699816 (-2.49%)
helped: 110
HURT: 31
helped stats (abs) min: 4 max: 13400 x̄: 2257.51 x̃: 1796
helped stats (rel) min: 0.01% max: 35.58% x̄: 4.33% x̃: 2.45%
HURT stats (abs)   min: 2 max: 14 x̄: 11.74 x̃: 14
HURT stats (rel)   min: <.01% max: 0.06% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -2113.77 -1403.42
95% mean confidence interval for cycles %-change: -4.27% -2.47%
Cycles are helped.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2899

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
2021-08-03 10:54:50 +00:00
Timothy Arceri a7f2e683de nir: move nir_block_ends_in_break() to nir.h
Will be used in a following commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
2021-08-03 10:54:50 +00:00
Timothy Arceri a9ed4538ab nir: add indirect loop unrolling to compiler options
This is where it should be rather than having to pass it into the
optimisation pass every time.

It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
2021-08-03 10:54:50 +00:00
Timur Kristóf da9f4b2e67 nir, aco: Remove vertex and primitive count overwrite intrinsic.
It's no longer needed.

No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Timur Kristóf 1bbea90f50 aco, nir, ac: Simplify sequence of getting initial NGG VS edge flags.
Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags
at once of every vertex. This takes fewer VALU instructions than
previously.

Fossil DB results on Sienna Cichlid (with NGGC on):

Totals from 56917 (44.24% of 128647) affected shaders:
CodeSize: 161028288 -> 158751628 (-1.41%)
Instrs: 30917985 -> 30519571 (-1.29%)
Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01%
InvThroughput: 21280238 -> 20927401 (-1.66%)
Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00%

No Fossil DB changed with NGGC off.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Emma Anholt 9ffd00bcf1 nir_to_tgsi: Pack our tex coords into vec4 nir_tex_src_backend[12].
For TGSI, we need the coordinate, comparator, bias, and LOD all together
in the first two vec4 args, and by doing it in the backend we were
generating extra MOVs.

softpipe shader-db results:
total instructions in shared programs: 2985416 -> 2953625 (-1.06%)
instructions in affected programs: 499937 -> 468146 (-6.36%)
total temps in shared programs: 544769 -> 565869 (3.87%)
temps in affected programs: 105469 -> 126569 (20.01%)

i915g shader-db:
total instructions in shared programs: 371625 -> 369594 (-0.55%)
instructions in affected programs: 24903 -> 22872 (-8.16%)
total tex_indirect in shared programs: 11381 -> 11365 (-0.14%)
tex_indirect in affected programs: 43 -> 27 (-37.21%)
LOST:   7
GAINED: 16

The temps increase is the pre-existing issue that we never release temps
for NIR regs, which doesn't matter much for softpipe (just memory/cache
footprint) but does for i915g as seen by shaders that no longer compile
(though overall we seem to win).

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11912>
2021-07-29 09:05:05 -07:00
Enrico Galli 16ef26ffcb nir_lower_readonly_images_to_tex: Fix typeo on image arrays
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12119>
2021-07-29 01:44:45 +00:00
Lionel Landwerlin 7e3bad0f8e nir/lower_shader_calls: adding missing stack offset alignment
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8dfb240b1f ("nir: Add raytracing shader call lowering pass.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12112>
2021-07-28 23:04:21 +00:00
Daniel Schürmann bc500da67d nir/shrink_vectors: shrink vecN properly
This patch allows to shrink vecN instructions where
one or more components at any position are unused.

Stat changes for softpipe:
total instructions in shared programs: 2986101 -> 2985416 (-0.02%)
instructions in affected programs: 51216 -> 50531 (-1.34%)

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
2021-07-26 09:24:37 +00:00
Daniel Schürmann 36fe7398c0 nir/shrink_vectors: shrink ALU properly
ALU instructions of which not all components are read,
can be shrunk to the number of read components.
Previously, this would only remove trailing components.

This patch enables to remove components from any position.

Stat changes for softpipe:
total instructions in shared programs: 3001291 -> 2984698 (-0.55%)
instructions in affected programs: 225585 -> 208992 (-7.36%)
total loops in shared programs: 1389 -> 1358 (-2.23%)
loops in affected programs: 36 -> 5 (-86.11%)

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
2021-07-26 09:24:37 +00:00
Daniel Schürmann 8317fe314c nir/opt_shrink_vectors: reverse iteration order
This pass should be backwards in order to reach the fixed point
in linear time.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
2021-07-26 09:24:37 +00:00
Daniel Schürmann d27417b597 nir: consider write_mask in nir_ssa_def_components_read()
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
2021-07-26 09:24:37 +00:00
Daniel Schürmann 73905c4d01 nir/opt_shrink_vectors: don't shrink vectors used by intrinsics
Store intrinsics shrink the sources by creating a new vecN.
Other intrinsics cannot shrink their sources.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
2021-07-26 09:24:37 +00:00
Daniel Schürmann ece99eb69f nir/lower_alu_to_scalar: don't skip gaps in write_mask
Otherwise, this may lead to segmentation faults.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
2021-07-26 09:24:37 +00:00
Jason Ekstrand 1431f6c765 nir: Validate newly documented texture restrictions
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
2021-07-23 15:53:57 +00:00
Mike Blumenkrantz 499cc7a9ec nir/validate: refactor validate_assert to have a return value
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
2021-07-23 15:53:57 +00:00
Jason Ekstrand 74ec2b12be nir/lower_tex: Rework invalid implicit LOD lowering
Only fragment and some compute shaders support implicit derivatives.
They're totally meaningless without helper invocations and some
understanding of the dispatch pattern.  We've got code to lower
nir_texop_tex in these shader stages to use an explicit derivative of 0
but it was pretty badly broken:

 1. It only handled nir_texop_tex, not nir_texop_txb or nir_texop_lod.

 2. It didn't take min_lod into account

 3. It was conflated with adding a missing LOD parameter to opcodes
    which expect one such as nir_texop_txf.  While not really a bug,
    this does make it way harder to reason about the code.

 4. Unless you set a flag (which most drivers don't), it left the
    opcode nir_texop_tex instead of nir_texop_txl which it should have
    been.

This reworks it to go through roughly the same path as other LOD
lowering only with a constant lod of 0 instead of calling out to
nir_texop_lod.  We also get rid of the lower_tex_without_implicit_lod
flag because most drivers set it and those that don't are probably
subtly broken.  If someone really wants to get nir_texop_tex in their
vertex shaders, they can write a new patch to add the flag back in.

Fixes: e382890e25 "nir: set default lod to texture opcodes that..."
Fixes: d5ac5d6e83 "nir: Add option to lower tex to txl when..."
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
2021-07-23 15:53:57 +00:00
Jason Ekstrand fa717a202c docs,nir: Document NIR texture instructions
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
2021-07-23 15:53:57 +00:00
Jason Ekstrand 4465ca296d nir: Suffix all the MCS texture stuff _intel
It's intel-specific, used to get at MSAA compression information.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
2021-07-23 15:53:57 +00:00
Jason Ekstrand 60b5faf572 nir/lower_tex: Add a lower_txs_cube_array option
Several bits of hardware require the division by 6 to happen in the
shader.  May as well have common lowering for it.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12005>
2021-07-22 14:22:35 -05:00
Jason Ekstrand c6102dda0a nir/lower_image: Handle index and bindless image_size
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12005>
2021-07-22 14:22:35 -05:00
Jordan Justen 6898549d56 nir: Add nir_lower_image() to lower cube image sizes
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9466>
2021-07-21 11:02:15 -07:00
Jason Ekstrand b0fba89cf6 nir/lower_subgroups: Handle down-casts in uint_to_ballot_type
This is required for Zink where the API ballot type is a uint64_t and
the "hardware" ballot type is uvec4.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11989>
2021-07-21 16:41:56 +00:00
Timothy Arceri 5cc36887ab nir/gcm: be less destructive with instruction order
This changes the pass to extract pinned instructions and not just unpinned
instructions when rescheduling instructions. This stops pinned instructions
from being bunched together when instructions are reinserted into the blocks
which can result in regressions with regards to cycles and instruction
counts on i965 and register use/Max Waves on AMD hardware.

In order to do this we also throw away the post-order depth-first
search linearization algorithm used to re-insert the instructions, which
itself causes possible regressions when instructions are reinserted into
a less than ideal new order (of which the bunched together pinned
instructions is one example). Instead we simply insert instructions in the
reverse order they were extracted. This will simply place instructions
that were scheduled earlier onto the end of their new block and
instructions that were scheduled later to the start of their new block.
With this everything should remain in order without the need to run
over uses.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/597>
2021-07-21 14:24:00 +00:00
Ian Romanick 436668874a nir/gcm: Clear out pass_flags before starting
With this pass enabled in Intel drivers, running shader-db on
shaders/unity/38.shader_test resulted in

Program received signal SIGSEGV, Segmentation fault.
gcm_schedule_early_src (src=0x555555d45348, void_state=0x7fffffffba40) at ../../SOURCE/master/src/compiler/nir/nir_opt_gcm.c:297
297	   if (info->early_block->index < src_info->early_block->index)
(gdb) print src_info->early_block
$1 = (nir_block *) 0x0

I tracked this down to an early exit from gcm_schedule_early_instr on
the parent instruction because instr->pass_flags was 0x1c.  That
should be an impossible value for this pass, so I inferred that
pass_flags must have dirt left from some previous pass.

Fixes: 8dfe6f672f ("nir/GCM: Use pass_flags instead of bitsets for tracking visited/pinned")

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/597>
2021-07-21 14:24:00 +00:00
Mike Blumenkrantz 3ab74d0ffa nir: add nir_imm_ivec3 builder
the other ones exist, so why not this one too

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11983>
2021-07-21 13:57:14 +00:00
Jason Ekstrand 393ee837fb nir: Add a format field to _deref image intrinsics
The rules here are the same as for texture instructions.  The bits on
the intrinsic are the ground truth and are allowed to vary from the
deref a bit as-needed.  If the intrinsic says PIPE_FORMAT_NONE, then we
can look at the variable, if visible, to get format information.  This
means that we need to be careful when we rewrite intrinsics based on the
deref to only override the format from the _deref intrinsic from the
image variable unless the intrinsic is PIPE_FORMAT_NONE.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11849>
2021-07-20 23:18:22 +00:00
Jason Ekstrand 0b57272af8 nir: Set src_components = -1 for image intrinsic deref sources
Semantically, -1 means "Unknown; don't validate" but it's really only
used for derefs because they often need to be flexible.  We don't really
need that flexibility for image intrinsics but this makes it more
consistent.  More immediately useful is that this gives us the ability
to tell _deref forms of these intrinsics apart from the lowered ones.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11849>
2021-07-20 23:18:22 +00:00
Jason Ekstrand c0afb60258 nir: Set IMAGE_DIM and IMAGE_ARRAY on deref intrinsics
The rules here are the same as for texture instructions.  The bits on
the intrinsic are the ground truth and are allowed to vary from the
deref a bit as-needed.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11849>
2021-07-20 23:18:22 +00:00
Mike Blumenkrantz 50f9519ea5 nir/lower_point_size_mov: zero nir_state_slot::swizzle in new variable
this is otherwise uninitialized during nir_serialize calls

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11932>
2021-07-20 16:34:51 +00:00
Sagar Ghuge 06ab737686 nir: Add optimizations for iadd3
This patch also adds has_iadd3 bit to give more control if backend
supports ternary add instruction or not.

v2:
- Add patterns in late optimization (Connor Abbott)

Suggested-by: Alyssa/Jason

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
2021-07-16 15:59:56 +00:00
Sagar Ghuge e8dff256c0 nir: Add new opcode for ternary addition
v2:
- Make it 2src commutative (Connor Abbott)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
2021-07-16 15:59:55 +00:00
Jason Ekstrand 0ee322acdb nir: Better document the Boissinot algorithm in nir_from_ssa()
Reviewed-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8815>
2021-07-16 06:19:25 +00:00
Emma Anholt bb35195b73 nir: Validate after deserialization.
It's a particularly relevant place for NIR bugs to occur, and if you make
a mistake in this code it gets caught in your debug build in something
like mesa/st's call to nir_split_var_copies() during finalization, which is
rather misleading.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11860>
2021-07-15 18:43:42 +00:00
Timur Kristóf 48e638ab29 nir: Add AMD specific intrinsics for NGG shader based culling.
The new intrinsics fall into the following categories:

1. New viewport intrinsics:
For missing components that we need.
RADV will emit new SGPR arguments which will contain the
viewport information for culling shaders. These are used to
compute the screen space coordinates for small primitive culling.

2. load_cull_xxx:
Load the culling settings in runtime.
These will be a new SGPR argument in RADV.

3. overwrite_xxx:
These are needed because system values such as vertex and
instance ID are not writeable, but we need to change them
after repacking shader invocations of VS and TES.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
2021-07-13 23:56:33 +00:00
Jason Ekstrand 2111551485 Convert a few files to UTF-8
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11788>
2021-07-12 23:45:34 +00:00
Jason Ekstrand a195ef123e nir/lower_subgroups: Pad ballot values before bitcasting
Otherwise, if we cast from a uint32_t to a uint64_t, the bitcast will
fail before we pad.  This happens on Intel.

Fixes: e4e79de2a4 "nir/subgroups: Support > 1 ballot components"
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5045
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11786>
2021-07-09 14:21:26 +00:00
Jason Ekstrand 624e799cc3 nir: Drop nir_ssa_def::name and nir_register::name
We say that they're for debug only but we don't really have a good
policy around when to set them and when not to.  In particular,
nir_lower_system_values and nir_lower_vars_to_ssa which are the chief
producers of SSA values which might reasonably have a name do not bother
to set one.  We have some names set from things like BLORP and RADV's
meta shaders but AFAICT, they're setting a name more because it's there
than because they actually care.

Also, most things other than nir_clone and nir_serialize don't bother to
try and preserve them.  You can see in the diffstat of this commit
exactly what passes attempt to preserve names.  Notably missing from the
list is opt_algebraic which is the single largest source of SSA def
churn and it happily throws names away.

These observations lead me to question whether or not names are actually
useful at all or if they're just taking up space (8B per instruction)
and wasting CPU cycles (to ralloc_strdup on the off chance we do have
one).  I don't think I can think of a single time in recent history
where I've been debugging a shader issue and a SSA value name has been
there and been useful.  If anything, the few times they are there, they
just throw me off because they mess up the indentation in nir_print.

iris shader-db on my system gets runtime -2.07734% +/- 1.26933% (n=5)

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5439>
2021-07-08 17:34:41 +00:00
Connor Abbott 68b8b9e9e1 tu, ir3: Plumb through support for CS subgroup size/id
The way that the blob obtains the subgroup id on compute shaders is by
just and'ing gl_LocalInvocationIndex with 63, since it advertizes a
subgroupSize of 64. In order to support VK_EXT_subgroup_size_control and
expose a subgroupSize of 128, we'll have to do something a little more
flexible. Sometimes we have to fall back to a subgroup size of 64 due to
various constraints, and in that case we have to fake a subgroup size of
128 while actually using 64 under the hood, by just pretending that the
upper 64 invocations are all disabled. However when computing the
subgroup id we need to use the "real" subgroup size. For this purpose we
plumb through a driver param which exposes the real subgroup size. If
the user forces a particular subgroup size then we lower
load_subgroup_size in nir_lower_subgroups, otherwise we let it through,
and we assume when translating to ir3 that load_subgroup_size means
"give me the *actual* subgroup size that you decided in RA" and give you
the driver param.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Connor Abbott cc514bfa0e nir: Add read_invocation_cond_ir3 intrinsic
On qualcomm, we have shared registers similar to SGPR's on AMD. However,
there is no readlane or readfirstlane primitive. shared registers can
only be written to when just one lane is active. This means that we have
to lower readInvocation(val, id) to something like:

if (gl_SubgroupInvocation == id) {
    scalar_reg = val;
}

return scalar_reg;

However it's a bit difficult to actually get the value of
gl_SubgroupInvocation in the backend, because for compute it requires
some calculations and we don't have any CSE support in the backend. This
intrinsic lets us turn it into
"readInvocationCond(val, id == gl_SubgroupInvocation)" in NIR at which
point the backend code generation is a lot easier.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Connor Abbott e4e79de2a4 nir/subgroups: Support > 1 ballot components
Qualcomm has a mode with a subgroup size of 128, so just emitting larger
integer operations and then lowering them later isn't an option. This
makes the pass able to handle the lowering itself, so that we don't have
to go down to 64-thread wavefronts when ballots are used.

(The GLSL and legacy SPIR-V extensions only support a maximum of 64
threads, but I guess we'll cross that bridge when we come to it...)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Connor Abbott 90819b9b0e nir/subgroups: Replace lower_vote_eq_to_ballot with lower_vote_eq
Lower it to a vote instead of a ballot. This was only used for AMD, and
in that case they're pretty much the same. However Qualcomm has a vote
builtin, which we want to use instead of ballots.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Mike Blumenkrantz b67a4ba4ad nir/format_convert: add ssa version of uint packing
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10619>
2021-07-07 13:41:37 +00:00
Mike Blumenkrantz c948251d2b nir/format_convert: nir_shift -> nir_shift_imm
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10619>
2021-07-07 13:41:37 +00:00
Emma Anholt 4118264643 nir: Free the instructions in a DCE instr removal.
No significant change in shader-db time (n=11), but should be a little win
for memory usage by the compiler.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11628>
2021-07-06 11:24:48 -07:00
Emma Anholt 5618445d45 nir: Use remove_and_dce for nir_shader_lower_instructions().
Reduces the work that other shader passes have to do to look at dead code,
and possibly extra rounds around the optimization loop if dce wasn't the
last pass in it.

shader-db runtime -1.12919% +/- 0.264337% (n=49) on SKL.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11628>
2021-07-06 11:24:45 -07:00
Emma Anholt 5251548572 nir: Add a nir_instr_remove that recursively removes dead code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11628>
2021-07-06 11:24:43 -07:00
Thomas H.P. Andersen ffea622604 nir/ifind_msb_rev: fix input check
ifind_msb_rev was introduced in a5747f8ab3.

ifind_msb_rev guards against src0 being both 0 or -1 at the same time.
That is always true. This patch changes it to check for those values
individually.

Spotted from a compile warning.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

Fixes: a5747f8ab3 (\"nir: add opcodes for *find_msb_rev and lowering\")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11630>
2021-07-04 12:17:58 +00:00
Jesse Natalie f8f2c3d835 nir_lower_readonly_images: Clear variable data when changing the type
For images, variable data includes the format. For samplers, variable
data is used for OpenCL inline samplers. When converting a variable
from one to the other, zero out the data so we don't accidentally
interpret a converted image as an inline sampler.

Fixes: fa677c86 ("nir_lower_readonly_images_to_tex: Support non-CL semantics")
Acked-by: Enrico Galli <enrico.galli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11674>
2021-07-02 04:24:22 +00:00
Alyssa Rosenzweig 3da23a9c7e nir: Fix constant folding for irhadd/urhadd
This should be a subtract, not an add. The comment's proof is correct,
but the (wrong) expression we actually use isn't what it's in the
comment! Correct the discrepancy.

The lowering in nir_opt_algebraic was correctly typed.

Fixes: 272e927d0e ("nir/spirv: initial handling of OpenCL.std extension opcodes")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11671>
2021-07-02 00:21:22 +00:00
Rob Clark c7b935962b nir: Add pass to lower phi precision
In addition to register pressure benefits from getting more fp16/int16,
this avoids i2imp's from standing in the way of loop unrolling.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11545>
2021-06-29 23:27:28 +00:00
Thomas H.P. Andersen b4369de27f nir/lower_packing: use shader_instructions_pass
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11615>
2021-06-29 22:08:29 +00:00
Thomas H.P. Andersen ed530ac6c2 nir: return progress from nir_lower_packing
Compiling with clang warns about an unused variable in
nir_lower_packing.

Tracking progress was added to nir_lower_packing in
adb157ddfd but the function
will ignore the progress from impl calls and always return
false.

This patch changes it to return the progress. It fixes the
warning and should enable validation calls in NIR_PASS when
progress is made.

Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: adb157ddfd "nir: Return progress from nir_lower_64bit_pack()"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11615>
2021-06-29 22:08:29 +00:00
Eleni Maria Stea 49e8b77fd9 intel: struct bitset is renamed to brw_bitset
Static struct bitset was renamed to brw_bitset as a struct bitset
is defined in sys/_bitset.h included by pthread_np.h on FreeBSD that
is indirectly included by src/intel/compiler/brw_nir_lower_shader_calls.c

Signed-off-by: Eleni Maria Stea <elene.mst@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11203>
2021-06-28 21:12:24 +03:00
Emma Anholt 0afab39af9 nir: Add a helper for chasing movs with nir_ssa_scalar().
Sometimes you might want to find a constant source without going through
all the copy prop and constant folding to make your source be a
load_const.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11613>
2021-06-28 16:26:24 +00:00
Rhys Perry 502b06c4f5 nir/opt_load_store_vectorize: fix check_for_robustness() with deref access
We could do better if we knew the nir_address_format to obtain
addition_bits, but the only affected driver (Turnip) probably won't
benefit because it doesn't vectorize across vec4.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 2e7bceb220 ("nir/load_store_vectorizer: fix check_for_robustness() with indirect loads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4922
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11382>
2021-06-28 15:15:42 +00:00
Caio Marcelo de Oliveira Filho 3a9289eaed nir: Add test to check edge case in Split ALU optimization
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11476>
2021-06-25 22:41:32 +00:00
Caio Marcelo de Oliveira Filho b951929795 nir/opt_if: Don't split ALU for single block infinite loops
Some infinite loop cases were already covered by other
restrictions (e.g. if the loop had a body), but the case with a single
block in the loop body wasn't yet.

This prevents an infinite loop when optimizing the shader in
dEQP-VK.reconvergence.subgroup_uniform_control_flow_ballot.compute.nesting2.3.2
and various others reconvergence tests.

Fixes: 0881e90c09 ("nir: Split ALU instructions in loops that read phis")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11476>
2021-06-25 22:41:32 +00:00
Enrico Galli 8a5333c105 nir: Add modes filter to nir_sort_variables
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10989>
2021-06-24 20:05:13 +00:00
Jason Ekstrand 81cb20bd17 nir: Add a function for sorting variables
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10989>
2021-06-24 20:05:13 +00:00
Lionel Landwerlin 7ed0aaced7 nir: use a more fitting index for btd_stack_push_intel
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
2021-06-22 21:09:25 +00:00
Lionel Landwerlin 423c47de99 nir: drop the btd_resume_intel intrinsic
This is now 100% equivalent to the new rt_resume intrinsic.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
2021-06-22 21:09:25 +00:00
Bas Nieuwenhuizen 8dfb240b1f nir: Add raytracing shader call lowering pass.
Really copying Jason's pass.

Changes:
- Instead of all the intel lowering introduce rt_{execute_callable,trace_ray,resume}
- Add the ability to use scratch intrinsics directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10339>
2021-06-21 21:23:51 +00:00
Bas Nieuwenhuizen 02c5dc8035 nir: Add lowered vendor independent raytracing intrinsics.
For use in a generic nir_lower_shader_calls.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10339>
2021-06-21 21:23:51 +00:00
Jason Ekstrand 73188c6954 nir,docs: Add docs for NIR ALU instructions
About half or more of the text here is actually from Connor Abbot.  I've
edited it a bit to bring it up-to-date and make a few things more clear.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
2021-06-21 16:46:59 +00:00
Jason Ekstrand f00b5a30f5 nir: Require vectorized ALU ops to be all-or-nothing
Long ago, the semantics of bcsel were such that it took a single boolean
value and selected between whole vectors.  These days, it takes a vector
boolean with the assumption that if you want the old behavior you can
just use a .xxxx swizzle.  There currently are no opcodes which use a
output_size of 0 but have a scalar or fixed-vector input.  Let's
disallow it for now to force us to think through the semantics again if
this ever comes up as something someone actually wants.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
2021-06-21 16:46:59 +00:00
Rhys Perry ea68d4a676 nir/propagate_invariant: add invariant_prim option
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11035>
2021-06-21 15:13:05 +00:00
Jason Ekstrand 2e08bae9b3 nir,vc4: Suffix a bunch of unorm 4x8 opcodes _vc4
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>
2021-06-21 09:04:08 -05:00
Jason Ekstrand 0afbfee8da nir,panfrost: Suffix fsat_signed and fclamp_pos with _mali
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>
2021-06-21 09:03:34 -05:00
Jason Ekstrand f0f713960b nir,amd: Suffix nir_op_cube_face_coord/index with _amd
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>
2021-06-21 09:03:34 -05:00
Emma Anholt 990c232603 nir: Add an interface for logging shaders with mesa_log*.
For debug on Android, it's useful to be able to print shaders to the
android log interface, since you don't usually have stdout/stderr.

Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9262>
2021-06-18 18:18:35 +00:00
Eric Anholt 47804f53f9 nir: Do peephole select on other instructions if the limit is ~0.
limit==0 is the signal for "don't peephole anything but a move that will
be optimized aways."  limit > 0 is "up to N alu instructions may be moved
out."  nir-to-tgsi uses ~0 as the indicator of "No, we really need to
eliminate all if instructions" on hardware like i915 that doesn't have
control flow.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
2021-06-18 04:30:43 +00:00
Emma Anholt aba8b6675a nir/lower_int_to_float: Make sure the cursor is in the right spot.
We need to make get it updated after we may have nir_instr_remove()d an
instruction, and when we cross blocks.  This didn't really matter before
because the only builder usage was idiv, which other users of
lower_int_to_float were probably never hitting.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
2021-06-18 04:30:43 +00:00
Iván Briano 4c67924251 intel/nir: Fix txs for null surfaces
Closes: #4860
Fixes: 05a37e2422 ("intel/nir: Set lower txs with non-zero LOD")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11435>
2021-06-17 11:55:22 -07:00
Rhys Perry 35e54abc67 nir/cse: resize the instruction set
ministat (CSE only):
Difference at 95.0% confidence
	-3357.54 +/- 32.5177
	-25.267% +/- 0.24098%
	(Student's t, pooled s = 33.909)

ministat (entire run):
Difference at 95.0% confidence
	-3414.27 +/- 270.628
	-2.76477% +/- 0.217647%
	(Student's t, pooled s = 282.207)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6390>
2021-06-15 17:57:07 +00:00
Rhys Perry 964f59d20e nir: use a single set during CSE
Use a single set and ensure dominance by checking after a equivalent
instruction is found.

Besides removing the need to copy a set, this also lets us resize the set
at the start of the pass in the next commit.

ministat (CSE only):
Difference at 95.0% confidence
	-984.956 +/- 28.8559
	-6.90075% +/- 0.190231%
	(Student's t, pooled s = 26.9052)

ministat (entire run):
Difference at 95.0% confidence
	-1246.1 +/- 257.253
	-0.998972% +/- 0.205094%
	(Student's t, pooled s = 239.863)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6390>
2021-06-15 17:57:07 +00:00
Jason Ekstrand e23b55c3f0 i965: Use nir_lower_passthrough_edgeflags
Now that there's a common NIR pass, there's no point in us doing this in
the back-end anymore.  In order to use this pass in i965, we do have to
make one tiny change.  Gallium runs the pass after assigning input and
output locations and so needs the pass to respect those locations and
num_inputs.  i965, however, runs it before any location assignment or
I/O lowering so we don't care.  We do, however, need the pass to succeed
with num_inputs == 0 because we set that later.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
2021-06-11 21:19:06 +00:00
Dave Airlie eff418fe57 nir/edgeflags: update outputs written when lowering edge flags.
In theory you can rerun the info gather pass, but in practice that
doesn't always end well. Be consistent inside this pass and update the
info.

While we're here, change the inputs read to use VERT_BIT_EDGEFLAG.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
2021-06-11 21:19:06 +00:00
Rhys Perry 7c63ec70ef nir: document that ACCESS_RESTRICT is not set at intrinsics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7295>
2021-06-10 13:17:22 +00:00
Rhys Perry 938098c98d nir/opt_load_store_vectorize: only require one variable to be restrict
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7295>
2021-06-10 13:17:22 +00:00
Rhys Perry 865ca3af2b nir/opt_load_store_vectorize: check for restrict at the variable
SPIR-V -> NIR doesn't set ACCESS_RESTRICT at the intrinsic.

fossil-db (GFX10.3):
Totals from 3 (0.00% of 139391) affected shaders:
CodeSize: 12364 -> 12356 (-0.06%)
Instrs: 2493 -> 2494 (+0.04%); split: -0.04%, +0.08%
Cycles: 15279372 -> 15295756 (+0.11%); split: -0.11%, +0.21%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7295>
2021-06-10 13:17:22 +00:00
Rhys Perry 2e7bceb220 nir/load_store_vectorizer: fix check_for_robustness() with indirect loads
fossil-db (GFX10.3, robustness2 enabled):
Totals from 13958 (9.54% of 146267) affected shaders:
VGPRs: 609168 -> 624304 (+2.48%); split: -0.05%, +2.53%
CodeSize: 48229504 -> 48488392 (+0.54%); split: -0.02%, +0.56%
MaxWaves: 354426 -> 349448 (-1.40%); split: +0.00%, -1.41%
Instrs: 9332093 -> 9375053 (+0.46%); split: -0.03%, +0.49%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7295>
2021-06-10 13:17:22 +00:00
Timur Kristóf 1e49018ced amd: Add extra source to the mbcnt_amd NIR intrinsic.
The v_mbcnt instructions can take an extra source that they add to
the result. This is not exposed in SPIR-V but we now expose it in NIR.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf 43ce80a58f nir: Add AMD-specific byte and lane permute intrinsics.
These map directly to v_perm_b32 and v_permlane_b32.
Unfortunately there is no corresponding NIR opcode or
intrinsics, and it's too tedious to puzzle these things
together from the existing NIR instructions.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Timur Kristóf c92dab8e2b nir: Add nir_op_sad_u8x4 which corresponds to AMD's v_sad_u8.
NIR currently doesn't have any intrinsics for a horizontal packed add,
so this one is modeled after AMD's v_sad_u8.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Caio Marcelo de Oliveira Filho e94c99513a nir/gather_info: Rename per_vertex to is_arrayed
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11252>
2021-06-09 07:35:57 +00:00
Caio Marcelo de Oliveira Filho a59f1d628a nir/lower_io: Rename vertex_index to array_index in helpers
The helpers will be reused for per-primitive variables that are also
arrayed, so use a more general name.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11252>
2021-06-09 07:35:57 +00:00
Alyssa Rosenzweig 95bd6e915f nir/lower_fragcolor: Avoid redundant load_output
At best, this is an extra instruction for NIR to optimize out. At worst,
depending on pass ordering nir_load_output could sneak into the final
NIR, even on drivers that don't support fbfetch.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11255>
2021-06-09 02:58:08 +00:00
Caio Marcelo de Oliveira Filho 8af6766062 nir: Move workgroup_size and workgroup_variable_size into common shader_info
Move it out the "cs" sub-struct, since these will be used for other
shader stages in the future.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11225>
2021-06-08 09:23:55 -07:00
Bas Nieuwenhuizen 6b7ff241f4 nir/lower_returns: Deal with single-arg phis after if.
if we have

   if ... {
      return;
   } else {
      // block X
   }
   // block Y
   phi(X: ...)

then nir_lower_returns tries to move block Y into the else body,
except nir_cf_extract doesn't move the phi. As the return is removed
in the then-body the phi suddenly has the wrong number of arguments
(and the phi doesn't dominate its uses anymore).

In this case we know that the phi has to be single arg, so we can just
rewrite the users of the phis and drop them.

Hit this in my RT adventures, not sure if this is actually reachable
right now, as single arg phis tend to be kind of exceptional outside
of CSSA and we typically call nir_lower_returns pretty early.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11207>
2021-06-08 11:29:53 +00:00
Rhys Perry 1cbcfb8b38 nir, nir/algebraic: add byte/word insertion instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:42 +00:00
Rhys Perry edae3e5623 nir/algebraic: optimize extract of extract
Found in some sottr shaders (originally iand(ishr(a, 16), 0xffff))

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:42 +00:00
Caio Marcelo de Oliveira Filho c8a7bd0dc8 nir: Rename WORK_GROUP (and similar) to WORKGROUP
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.

Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Caio Marcelo de Oliveira Filho a71a780598 nir: Rename nir_intrinsic_load_local_group_size to nir_intrinsic_load_workgroup_size
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Caio Marcelo de Oliveira Filho 43a6a2151b compiler: Rename SYSTEM_VALUE_LOCAL_GROUP_SIZE to SYSTEM_VALUE_WORKGROUP_SIZE
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Caio Marcelo de Oliveira Filho 430d2206da compiler: Rename local_size to workgroup_size
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Alyssa Rosenzweig c509878971 nir: Add nir_intrinsic_load_back_face_agx
On AGX, the special register for front facing is inverted from its meaning in
APIs. We need to lower load_front_face to inot(load_back_face). Doing this in
the backend is trivial, but then we would miss out on algebraic optimizations
for the inot.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11199>
2021-06-05 20:38:22 +00:00
Hoe Hao Cheng 90a5fef85c nir: define NIR_ALU_MAX_INPUTS
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11172>
2021-06-04 19:33:13 +00:00
Rhys Perry 49add985ff nir/unsigned_upper_bound: don't require dominance metadata
Instead, determine if it's a merge or loop exit phi.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9808>
2021-06-04 14:14:00 +00:00
Mike Blumenkrantz f9ecbb1e1d nir/builder: add nir_mask
it's handy to have functions for generating masks

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10620>
2021-05-26 04:06:27 +00:00
Timothy Arceri 8b180ab98b nir/lower_io_to_vector: fix per vertex io handling for arrays
The pass was processing the per vertex index from the wrong end
of the array deref chain.

Fixes: bcd14756ee ("nir/lower_io_to_vector: add flat mode")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10798>
2021-05-21 02:43:30 +00:00
Ian Romanick 880b00dc59 nir/lower_tex: Add support for lowering YUYV formats
v2: Rebase on bc438c91d9 ("nir/lower_tex: ignore texture_index if
tex_instr has deref src")

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9610>
2021-05-21 01:40:22 +00:00
Ian Romanick 1358d93650 nir/lower_tex: Add support for lowering Y41x formats
These are similar to AYUV, but the channel ordering is different... in
such a way that there's no RGBA format that will make the channels line
up right.

v2: Rebase on bc438c91d9 ("nir/lower_tex: ignore texture_index if
tex_instr has deref src")

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9610>
2021-05-21 01:40:22 +00:00
Ian Romanick d246c31ec1 nir/algebraic: Add algebraic opt for float comparisons with identical operands.
The flt version could have been added in 56e21647e2, but our
collective understanding of NaN and comparisons was poor in 2015.  The
new "is_a_number" predicate makes the others possible.

All of the helped shaders in shader-db are either from Mad Max or Skia.
Some of the Skia shaders just get decimated by this change:

instructions helped:   shaders/skia/580-4.shader_test FS SIMD8:          81 -> 29 (-64.20%) (scheduled: top-down)

I looked at a couple of those shaders, and they had sequences like:

        vec1 32 ssa_44 = flt32 ssa_32, ssa_32
        vec1 32 ssa_45 = b32csel ssa_44, ssa_43, ssa_0
        vec1 32 ssa_46 = fge32 ssa_32, ssa_32
        vec1 32 ssa_47 = b32csel ssa_46, ssa_0, ssa_45
        vec1 32 ssa_48 = iand ssa_46, ssa_44
        vec1 32 ssa_49 = b32csel ssa_48, ssa_43, ssa_0

ssa_44 is replaced with False.  Then ssa_47 selects between ssa_0 and
ssa_0, so ssa_47 and ssa_46 are eliminated.  ssa_48 is (False && don't
care), so ssa_48 and ssa_49 are eliminated.  After that, many
calculations now involve constants of zero, so they are optimized down
too.  So it continues until there's not much left!

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21072238 -> 21071386 (<.01%)
instructions in affected programs: 33722 -> 32870 (-2.53%)
helped: 146
HURT: 1
helped stats (abs) min: 1 max: 62 x̄: 5.84 x̃: 2
helped stats (rel) min: 0.19% max: 62.35% x̄: 4.09% x̃: 1.07%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.20% max: 0.20% x̄: 0.20% x̃: 0.20%
95% mean confidence interval for instructions value: -7.94 -3.65
95% mean confidence interval for instructions %-change: -5.87% -2.25%
Instructions are helped.

total cycles in shared programs: 856203326 -> 856192238 (<.01%)
cycles in affected programs: 749966 -> 738878 (-1.48%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 1226 x̄: 74.92 x̃: 18
helped stats (rel) min: 0.07% max: 49.70% x̄: 2.69% x̃: 0.46%
95% mean confidence interval for cycles value: -104.82 -45.02
95% mean confidence interval for cycles %-change: -4.01% -1.37%
Cycles are helped.

LOST:   4
GAINED: 0

Fossil-db results:

Tiger Lake
Instructions in all programs: 160915223 -> 160898354 (-0.0%)
SENDs in all programs: 6812780 -> 6812780 (+0.0%)
Loops in all programs: 38340 -> 38340 (+0.0%)
Cycles in all programs: 7434144207 -> 7433978462 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304537 -> 304537 (+0.0%)

Ice Lake
Instructions in all programs: 145296298 -> 145279531 (-0.0%)
SENDs in all programs: 6863692 -> 6863692 (+0.0%)
Loops in all programs: 38334 -> 38334 (+0.0%)
Cycles in all programs: 8800257014 -> 8800088384 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334248 -> 334248 (+0.0%)

Skylake
Instructions in all programs: 135891664 -> 135874910 (-0.0%)
SENDs in all programs: 6802946 -> 6802946 (+0.0%)
Loops in all programs: 38331 -> 38331 (+0.0%)
Cycles in all programs: 8444273433 -> 8444130932 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301114 -> 301114 (+0.0%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 64bcfc3a17 nir/algebraic: Rearrange some logic-joined comparisons and reduce
On Skylake and Broadwell, a single big compute shader in Dirt Rally has
spills and fills *REALLY* helped.  That same shader is hurt very
slightly for spills and fills on Ice Lake.

v2: Move the patterns earlier to be nearer other patterns that are
similar.  Mark the replacement fmin and fmax exact.  Both suggested by
Rhys.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21073812 -> 21073041 (<.01%)
instructions in affected programs: 77608 -> 76837 (-0.99%)
helped: 522
HURT: 33
helped stats (abs) min: 1 max: 26 x̄: 1.58 x̃: 1
helped stats (rel) min: 0.22% max: 14.29% x̄: 1.29% x̃: 1.02%
HURT stats (abs)   min: 1 max: 8 x̄: 1.67 x̃: 1
HURT stats (rel)   min: 0.25% max: 3.42% x̄: 1.06% x̃: 0.86%
95% mean confidence interval for instructions value: -1.57 -1.20
95% mean confidence interval for instructions %-change: -1.25% -1.05%
Instructions are helped.

total cycles in shared programs: 856224346 -> 856211096 (<.01%)
cycles in affected programs: 2394231 -> 2380981 (-0.55%)
helped: 603
HURT: 25
helped stats (abs) min: 1 max: 5218 x̄: 59.37 x̃: 28
helped stats (rel) min: 0.06% max: 5.61% x̄: 1.52% x̃: 1.37%
HURT stats (abs)   min: 2 max: 21394 x̄: 901.92 x̃: 10
HURT stats (rel)   min: 0.02% max: 5.90% x̄: 0.95% x̃: 0.59%
95% mean confidence interval for cycles value: -93.61 51.41
95% mean confidence interval for cycles %-change: -1.50% -1.34%
Inconclusive result (value mean confidence interval includes 0).

LOST:   1
GAINED: 1

Ice Lake
total instructions in shared programs: 20025692 -> 20024554 (<.01%)
instructions in affected programs: 104981 -> 103843 (-1.08%)
helped: 738
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 1.54 x̃: 1
helped stats (rel) min: 0.31% max: 10.53% x̄: 1.20% x̃: 1.06%
95% mean confidence interval for instructions value: -1.66 -1.43
95% mean confidence interval for instructions %-change: -1.26% -1.14%
Instructions are helped.

total cycles in shared programs: 979474407 -> 979422333 (<.01%)
cycles in affected programs: 4136364 -> 4084290 (-1.26%)
helped: 759
HURT: 59
helped stats (abs) min: 2 max: 11010 x̄: 72.78 x̃: 28
helped stats (rel) min: 0.03% max: 6.43% x̄: 1.23% x̃: 1.02%
HURT stats (abs)   min: 1 max: 698 x̄: 53.66 x̃: 8
HURT stats (rel)   min: 0.02% max: 24.05% x̄: 1.64% x̃: 0.33%
95% mean confidence interval for cycles value: -97.08 -30.24
95% mean confidence interval for cycles %-change: -1.14% -0.91%
Cycles are helped.

total spills in shared programs: 10568 -> 10569 (<.01%)
spills in affected programs: 102 -> 103 (0.98%)
helped: 0
HURT: 1

total fills in shared programs: 11347 -> 11349 (0.02%)
fills in affected programs: 277 -> 279 (0.72%)
helped: 0
HURT: 1

LOST:   2
GAINED: 2

Skylake
total instructions in shared programs: 18190419 -> 18188523 (-0.01%)
instructions in affected programs: 102502 -> 100606 (-1.85%)
helped: 791
HURT: 0
helped stats (abs) min: 1 max: 676 x̄: 2.40 x̃: 1
helped stats (rel) min: 0.34% max: 20.23% x̄: 1.41% x̃: 1.23%
95% mean confidence interval for instructions value: -4.07 -0.72
95% mean confidence interval for instructions %-change: -1.47% -1.34%
Instructions are helped.

total cycles in shared programs: 960737969 -> 960498951 (-0.02%)
cycles in affected programs: 4435351 -> 4196333 (-5.39%)
helped: 804
HURT: 67
helped stats (abs) min: 1 max: 198540 x̄: 300.54 x̃: 24
helped stats (rel) min: 0.03% max: 25.41% x̄: 1.21% x̃: 0.92%
HURT stats (abs)   min: 2 max: 680 x̄: 39.06 x̃: 6
HURT stats (rel)   min: 0.05% max: 23.98% x̄: 1.12% x̃: 0.19%
95% mean confidence interval for cycles value: -722.03 173.20
95% mean confidence interval for cycles %-change: -1.15% -0.91%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 9757 -> 9722 (-0.36%)
spills in affected programs: 138 -> 103 (-25.36%)
helped: 1
HURT: 0

total fills in shared programs: 9861 -> 9576 (-2.89%)
fills in affected programs: 564 -> 279 (-50.53%)
helped: 1
HURT: 0

LOST:   5
GAINED: 2

Broadwell
total instructions in shared programs: 17853870 -> 17852414 (<.01%)
instructions in affected programs: 101276 -> 99820 (-1.44%)
helped: 777
HURT: 0
helped stats (abs) min: 1 max: 264 x̄: 1.87 x̃: 1
helped stats (rel) min: 0.34% max: 8.44% x̄: 1.37% x̃: 1.23%
95% mean confidence interval for instructions value: -2.54 -1.21
95% mean confidence interval for instructions %-change: -1.42% -1.32%
Instructions are helped.

total cycles in shared programs: 1029846029 -> 1029725458 (-0.01%)
cycles in affected programs: 4435791 -> 4315220 (-2.72%)
helped: 813
HURT: 43
helped stats (abs) min: 2 max: 68560 x̄: 149.95 x̃: 24
helped stats (rel) min: 0.02% max: 73.73% x̄: 1.43% x̃: 0.92%
HURT stats (abs)   min: 2 max: 726 x̄: 31.12 x̃: 13
HURT stats (rel)   min: 0.01% max: 8.43% x̄: 0.62% x̃: 0.31%
95% mean confidence interval for cycles value: -299.58 17.87
95% mean confidence interval for cycles %-change: -1.63% -1.02%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 20333 -> 20307 (-0.13%)
spills in affected programs: 151 -> 125 (-17.22%)
helped: 1
HURT: 0

total fills in shared programs: 25899 -> 25775 (-0.48%)
fills in affected programs: 573 -> 449 (-21.64%)
helped: 1
HURT: 0

LOST:   5
GAINED: 0

Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Haswell shown)
total instructions in shared programs: 16417658 -> 16416320 (<.01%)
instructions in affected programs: 96495 -> 95157 (-1.39%)
helped: 774
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.73 x̃: 1
helped stats (rel) min: 0.33% max: 9.80% x̄: 1.52% x̃: 1.20%
95% mean confidence interval for instructions value: -1.83 -1.63
95% mean confidence interval for instructions %-change: -1.59% -1.46%
Instructions are helped.

total cycles in shared programs: 1037104346 -> 1037080579 (<.01%)
cycles in affected programs: 3787747 -> 3763980 (-0.63%)
helped: 791
HURT: 53
helped stats (abs) min: 1 max: 5411 x̄: 65.87 x̃: 32
helped stats (rel) min: 0.02% max: 21.17% x̄: 1.44% x̃: 1.18%
HURT stats (abs)   min: 2 max: 14160 x̄: 534.72 x̃: 18
HURT stats (rel)   min: 0.02% max: 15.37% x̄: 5.70% x̃: 0.54%
95% mean confidence interval for cycles value: -69.39 13.07
95% mean confidence interval for cycles %-change: -1.19% -0.80%
Inconclusive result (value mean confidence interval includes 0).

LOST:   12
GAINED: 2

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8132855 -> 8132703 (<.01%)
instructions in affected programs: 8782 -> 8630 (-1.73%)
helped: 38
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.66% max: 3.23% x̄: 1.77% x̃: 1.72%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.88% -1.65%
Instructions are helped.

total cycles in shared programs: 238300850 -> 238298568 (<.01%)
cycles in affected programs: 257202 -> 254920 (-0.89%)
helped: 62
HURT: 2
helped stats (abs) min: 4 max: 58 x̄: 36.90 x̃: 50
helped stats (rel) min: 0.15% max: 1.55% x̄: 0.87% x̃: 1.12%
HURT stats (abs)   min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 0.12% max: 0.22% x̄: 0.17% x̃: 0.17%
95% mean confidence interval for cycles value: -41.34 -29.98
95% mean confidence interval for cycles %-change: -0.95% -0.73%
Cycles are helped.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 145296888 -> 145296346 (-0.0%)
SENDs in all programs: 6863696 -> 6863696 (+0.0%)
Loops in all programs: 38334 -> 38334 (+0.0%)
Cycles in all programs: 8800262303 -> 8800258950 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334248 -> 334248 (+0.0%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick adc2835646 nir/algebraic: Mark some more logic-joined comparison reductions as exact
If the values are known to be numbers, the the replacements are exact.
This is only applied to the patterns with constants.  Constants should
always be numbers, and shaders with NaN constants should be handled in a
different way.

No shader-db or fossil-db changes on any Intel platform.  The intention
is to make these patterns more future proof.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 23bbf3932b nir/algebraic: Mark some more comparison reductions exact
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Haswell and later Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21049056 -> 21048939 (<.01%)
instructions in affected programs: 4716 -> 4599 (-2.48%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.99% max: 5.43% x̄: 2.80% x̃: 2.51%
95% mean confidence interval for instructions value: -3.46 -2.54
95% mean confidence interval for instructions %-change: -3.22% -2.38%
Instructions are helped.

total cycles in shared programs: 855141411 -> 855141159 (<.01%)
cycles in affected programs: 54491 -> 54239 (-0.46%)
helped: 28
HURT: 5
helped stats (abs) min: 2 max: 34 x̄: 12.82 x̃: 12
helped stats (rel) min: 0.06% max: 2.73% x̄: 0.94% x̃: 0.75%
HURT stats (abs)   min: 2 max: 52 x̄: 21.40 x̃: 6
HURT stats (rel)   min: 0.11% max: 2.46% x̄: 0.90% x̃: 0.56%
95% mean confidence interval for cycles value: -13.72 -1.55
95% mean confidence interval for cycles %-change: -1.01% -0.31%
Cycles are helped.

Tiger Lake
Instructions in all programs: 160902191 -> 160899554 (-0.0%)
SENDs in all programs: 6812435 -> 6812435 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7428581420 -> 7428555881 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)

A lot of fragment shaders in Shadow of the Tomb Raider were helped, and
a bunch of vertex shaders in Octopath Traveler were hurt.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 7d85dc4f35 nir/algebraic: Equality comparison inversions require sources be numbers
v2: Update A630 expected image checksum for minetest.trace.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21036690 -> 21049485 (0.06%)
instructions in affected programs: 852085 -> 864880 (1.50%)
helped: 240
HURT: 2514
helped stats (abs) min: 1 max: 46 x̄: 2.45 x̃: 2
helped stats (rel) min: 0.15% max: 4.30% x̄: 0.79% x̃: 0.55%
HURT stats (abs)   min: 1 max: 198 x̄: 5.32 x̃: 2
HURT stats (rel)   min: 0.06% max: 10.71% x̄: 1.48% x̃: 1.04%
95% mean confidence interval for instructions value: 4.14 5.15
95% mean confidence interval for instructions %-change: 1.23% 1.34%
Instructions are HURT.

total cycles in shared programs: 856045255 -> 855816220 (-0.03%)
cycles in affected programs: 16743786 -> 16514751 (-1.37%)
helped: 790
HURT: 1973
helped stats (abs) min: 1 max: 10766 x̄: 627.97 x̃: 18
helped stats (rel) min: <.01% max: 32.59% x̄: 3.01% x̃: 0.64%
HURT stats (abs)   min: 1 max: 4078 x̄: 135.36 x̃: 18
HURT stats (rel)   min: <.01% max: 54.56% x̄: 2.80% x̃: 0.82%
95% mean confidence interval for cycles value: -131.36 -34.42
95% mean confidence interval for cycles %-change: 0.88% 1.40%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total spills in shared programs: 9771 -> 9766 (-0.05%)
spills in affected programs: 47 -> 42 (-10.64%)
helped: 1
HURT: 0

total fills in shared programs: 9451 -> 9430 (-0.22%)
fills in affected programs: 91 -> 70 (-23.08%)
helped: 1
HURT: 0

LOST:   16
GAINED: 51

All Intel GPUs from Sandybridge through Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 20024781 -> 20025568 (<.01%)
instructions in affected programs: 103309 -> 104096 (0.76%)
helped: 12
HURT: 389
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.20% max: 2.70% x̄: 1.36% x̃: 1.37%
HURT stats (abs)   min: 1 max: 8 x̄: 2.06 x̃: 1
HURT stats (rel)   min: 0.05% max: 7.14% x̄: 1.25% x̃: 0.95%
95% mean confidence interval for instructions value: 1.78 2.15
95% mean confidence interval for instructions %-change: 1.06% 1.28%
Instructions are HURT.

total cycles in shared programs: 979419070 -> 979439180 (<.01%)
cycles in affected programs: 4968711 -> 4988821 (0.40%)
helped: 60
HURT: 381
helped stats (abs) min: 1 max: 1296 x̄: 96.92 x̃: 26
helped stats (rel) min: <.01% max: 27.10% x̄: 1.64% x̃: 0.65%
HURT stats (abs)   min: 1 max: 7320 x̄: 68.04 x̃: 30
HURT stats (rel)   min: <.01% max: 19.77% x̄: 1.32% x̃: 0.87%
95% mean confidence interval for cycles value: 10.25 80.95
95% mean confidence interval for cycles %-change: 0.69% 1.15%
Cycles are HURT.

LOST:   1
GAINED: 2

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8128474 -> 8132527 (0.05%)
instructions in affected programs: 642323 -> 646376 (0.63%)
helped: 12
HURT: 1972
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
helped stats (rel) min: 0.72% max: 1.72% x̄: 1.09% x̃: 0.83%
HURT stats (abs)   min: 1 max: 16 x̄: 2.07 x̃: 3
HURT stats (rel)   min: 0.12% max: 7.14% x̄: 0.77% x̃: 0.70%
95% mean confidence interval for instructions value: 1.99 2.10
95% mean confidence interval for instructions %-change: 0.74% 0.79%
Instructions are HURT.

total cycles in shared programs: 238280994 -> 238294376 (<.01%)
cycles in affected programs: 8841250 -> 8854632 (0.15%)
helped: 84
HURT: 1192
helped stats (abs) min: 4 max: 64 x̄: 12.50 x̃: 8
helped stats (rel) min: 0.02% max: 1.61% x̄: 0.28% x̃: 0.17%
HURT stats (abs)   min: 2 max: 198 x̄: 12.11 x̃: 12
HURT stats (rel)   min: 0.02% max: 8.03% x̄: 0.28% x̃: 0.14%
95% mean confidence interval for cycles value: 9.65 11.32
95% mean confidence interval for cycles %-change: 0.22% 0.27%
Cycles are HURT.

No fossil-db changes on any Intel platform.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 4246c2869c nir/algebraic: Invert comparisons less often
This fixes the piglit test range_analysis_fsat_of_nan.shader_test.  That
test contains some code like

    o = saturate(X) > 0 ? vec4(1.0, 0.0, 0.0, 1.0)
                        : vec4(0.0, 1.0, 0.0, 1.0);

A clever optimizer will convert this to

    o = vec4(float(saturate(X) > 0),
             float(!(saturate(X) > 0)),
             0, 1);

Due to the ordering of optimizations in the compiler, the `saturate`
operations are removed.  This is safe even in the presense of NaN.

    o = vec4(float(X > 0), float(!(X > 0)), 0, 1);

Since the calculations are not marked precise, an overzealous
optimizer may reduce this to

    o = vec4(float(X > 0), float(X <= 0), 0, 1);

This will result in black being output.  The GLSL spec gives quite a bit
of leeway with respect to NaN, but that seems too far.  The shader
author asked for a result of red or green.  A result of black is still
"undefined behavior," but it's also a little mean.

This also enables CSE to do its job better.

v2: Update A530 expected image checksum for minetest.trace.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4531
Fixes: 0dbda153aa ("nir/algebraic: Flag inexact optimizations")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21041563 -> 21041789 (<.01%)
instructions in affected programs: 992066 -> 992292 (0.02%)
helped: 526
HURT: 548
helped stats (abs) min: 1 max: 16 x̄: 2.48 x̃: 2
helped stats (rel) min: 0.04% max: 5.56% x̄: 0.74% x̃: 0.49%
HURT stats (abs)   min: 1 max: 27 x̄: 2.80 x̃: 2
HURT stats (rel)   min: 0.04% max: 4.55% x̄: 0.59% x̃: 0.38%
95% mean confidence interval for instructions value: -0.00 0.42
95% mean confidence interval for instructions %-change: -0.12% <.01%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 855885569 -> 856118189 (0.03%)
cycles in affected programs: 343637248 -> 343869868 (0.07%)
helped: 907
HURT: 541
helped stats (abs) min: 1 max: 7724 x̄: 206.45 x̃: 36
helped stats (rel) min: <.01% max: 29.97% x̄: 1.01% x̃: 0.37%
HURT stats (abs)   min: 1 max: 14177 x̄: 776.09 x̃: 31
HURT stats (rel)   min: <.01% max: 29.94% x̄: 1.24% x̃: 0.35%
95% mean confidence interval for cycles value: 84.30 237.00
95% mean confidence interval for cycles %-change: -0.32% -0.01%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

LOST:   3
GAINED: 5

Ice Lake
total instructions in shared programs: 20027107 -> 20025352 (<.01%)
instructions in affected programs: 1068856 -> 1067101 (-0.16%)
helped: 1153
HURT: 273
helped stats (abs) min: 1 max: 14 x̄: 1.83 x̃: 1
helped stats (rel) min: 0.03% max: 5.66% x̄: 0.61% x̃: 0.35%
HURT stats (abs)   min: 1 max: 15 x̄: 1.29 x̃: 1
HURT stats (rel)   min: 0.16% max: 1.30% x̄: 0.58% x̃: 0.60%
95% mean confidence interval for instructions value: -1.33 -1.13
95% mean confidence interval for instructions %-change: -0.43% -0.34%
Instructions are helped.

total cycles in shared programs: 979499227 -> 979448725 (<.01%)
cycles in affected programs: 344261539 -> 344211037 (-0.01%)
helped: 1079
HURT: 441
helped stats (abs) min: 1 max: 9384 x̄: 147.78 x̃: 48
helped stats (rel) min: <.01% max: 31.83% x̄: 0.90% x̃: 0.33%
HURT stats (abs)   min: 1 max: 7220 x̄: 247.07 x̃: 32
HURT stats (rel)   min: <.01% max: 31.30% x̄: 1.52% x̃: 0.53%
95% mean confidence interval for cycles value: -70.01 3.56
95% mean confidence interval for cycles %-change: -0.35% -0.05%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 10564 -> 10568 (0.04%)
spills in affected programs: 143 -> 147 (2.80%)
helped: 0
HURT: 1

total fills in shared programs: 11343 -> 11347 (0.04%)
fills in affected programs: 287 -> 291 (1.39%)
helped: 0
HURT: 1

LOST:   3
GAINED: 2

Skylake
total instructions in shared programs: 18192274 -> 18190128 (-0.01%)
instructions in affected programs: 1000188 -> 998042 (-0.21%)
helped: 1149
HURT: 55
helped stats (abs) min: 1 max: 14 x̄: 1.92 x̃: 1
helped stats (rel) min: 0.04% max: 6.67% x̄: 0.67% x̃: 0.42%
HURT stats (abs)   min: 1 max: 2 x̄: 1.05 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.55% x̄: 0.27% x̃: 0.26%
95% mean confidence interval for instructions value: -1.87 -1.69
95% mean confidence interval for instructions %-change: -0.67% -0.58%
Instructions are helped.

total cycles in shared programs: 960856054 -> 960728040 (-0.01%)
cycles in affected programs: 340840968 -> 340712954 (-0.04%)
helped: 1079
HURT: 233
helped stats (abs) min: 1 max: 7640 x̄: 170.95 x̃: 46
helped stats (rel) min: <.01% max: 30.20% x̄: 0.96% x̃: 0.28%
HURT stats (abs)   min: 1 max: 6864 x̄: 242.23 x̃: 26
HURT stats (rel)   min: <.01% max: 34.64% x̄: 2.10% x̃: 0.22%
95% mean confidence interval for cycles value: -135.62 -59.53
95% mean confidence interval for cycles %-change: -0.59% -0.25%
Cycles are helped.

LOST:   15
GAINED: 1

Broadwell
total instructions in shared programs: 17855624 -> 17853580 (-0.01%)
instructions in affected programs: 1012209 -> 1010165 (-0.20%)
helped: 1105
HURT: 52
helped stats (abs) min: 1 max: 13 x̄: 1.90 x̃: 1
helped stats (rel) min: 0.03% max: 6.67% x̄: 0.67% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.13% max: 0.52% x̄: 0.26% x̃: 0.25%
95% mean confidence interval for instructions value: -1.86 -1.67
95% mean confidence interval for instructions %-change: -0.68% -0.58%
Instructions are helped.

total cycles in shared programs: 1029905447 -> 1029840699 (<.01%)
cycles in affected programs: 347102680 -> 347037932 (-0.02%)
helped: 1007
HURT: 211
helped stats (abs) min: 1 max: 1360 x̄: 89.76 x̃: 48
helped stats (rel) min: <.01% max: 16.26% x̄: 0.69% x̃: 0.25%
HURT stats (abs)   min: 1 max: 1297 x̄: 121.51 x̃: 20
HURT stats (rel)   min: <.01% max: 31.31% x̄: 1.21% x̃: 0.20%
95% mean confidence interval for cycles value: -62.39 -43.92
95% mean confidence interval for cycles %-change: -0.47% -0.25%
Cycles are helped.

total spills in shared programs: 20335 -> 20333 (<.01%)
spills in affected programs: 19 -> 17 (-10.53%)
helped: 2
HURT: 0

total fills in shared programs: 25905 -> 25899 (-0.02%)
fills in affected programs: 23 -> 17 (-26.09%)
helped: 2
HURT: 0

LOST:   9
GAINED: 0

Haswell
total instructions in shared programs: 16418516 -> 16417293 (<.01%)
instructions in affected programs: 223785 -> 222562 (-0.55%)
helped: 590
HURT: 67
helped stats (abs) min: 1 max: 15 x̄: 2.19 x̃: 1
helped stats (rel) min: 0.03% max: 6.52% x̄: 0.87% x̃: 0.60%
HURT stats (abs)   min: 1 max: 2 x̄: 1.04 x̃: 1
HURT stats (rel)   min: 0.04% max: 1.85% x̄: 0.44% x̃: 0.25%
95% mean confidence interval for instructions value: -2.01 -1.71
95% mean confidence interval for instructions %-change: -0.80% -0.67%
Instructions are helped.

total cycles in shared programs: 1037179754 -> 1037084874 (<.01%)
cycles in affected programs: 352541071 -> 352446191 (-0.03%)
helped: 1093
HURT: 182
helped stats (abs) min: 1 max: 888 x̄: 111.03 x̃: 64
helped stats (rel) min: <.01% max: 27.30% x̄: 0.84% x̃: 0.20%
HURT stats (abs)   min: 1 max: 6777 x̄: 145.49 x̃: 21
HURT stats (rel)   min: <.01% max: 24.10% x̄: 1.99% x̃: 0.29%
95% mean confidence interval for cycles value: -88.10 -60.73
95% mean confidence interval for cycles %-change: -0.58% -0.29%
Cycles are helped.

total spills in shared programs: 17457 -> 17456 (<.01%)
spills in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0

total fills in shared programs: 20387 -> 20385 (<.01%)
fills in affected programs: 15 -> 13 (-13.33%)
helped: 1
HURT: 0

LOST:   6
GAINED: 1

Ivy Bridge and earlier platforms had similar results. (Ivy Bridge shown)
total instructions in shared programs: 15515482 -> 15513998 (<.01%)
instructions in affected programs: 239739 -> 238255 (-0.62%)
helped: 573
HURT: 57
helped stats (abs) min: 1 max: 20 x̄: 2.73 x̃: 2
helped stats (rel) min: 0.03% max: 9.84% x̄: 0.94% x̃: 0.55%
HURT stats (abs)   min: 1 max: 2 x̄: 1.39 x̃: 1
HURT stats (rel)   min: 0.09% max: 1.85% x̄: 0.52% x̃: 0.35%
95% mean confidence interval for instructions value: -2.57 -2.14
95% mean confidence interval for instructions %-change: -0.89% -0.73%
Instructions are helped.

total cycles in shared programs: 584509880 -> 584463152 (<.01%)
cycles in affected programs: 11765280 -> 11718552 (-0.40%)
helped: 661
HURT: 152
helped stats (abs) min: 1 max: 3073 x̄: 101.99 x̃: 32
helped stats (rel) min: <.01% max: 34.38% x̄: 1.46% x̃: 0.50%
HURT stats (abs)   min: 1 max: 6637 x̄: 136.10 x̃: 15
HURT stats (rel)   min: <.01% max: 24.19% x̄: 1.75% x̃: 0.25%
95% mean confidence interval for cycles value: -82.79 -32.16
95% mean confidence interval for cycles %-change: -1.11% -0.61%
Cycles are helped.

LOST:   9
GAINED: 0

Tiger Lake
Instructions in all programs: 160905127 -> 160900949 (-0.0%)
SENDs in all programs: 6812418 -> 6812085 (-0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7431911114 -> 7433914697 (+0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304537 (-0.0%)

Ice Lake
Instructions in all programs: 145296733 -> 145292370 (-0.0%)
SENDs in all programs: 6863818 -> 6863485 (-0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798257570 -> 8800204360 (+0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334248 (-0.0%)

Skylake
Instructions in all programs: 135891485 -> 135887357 (-0.0%)
SENDs in all programs: 6803031 -> 6802698 (-0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442221881 -> 8444201959 (+0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301114 (-0.0%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 49177b9e2f nir/algebraic: Tautology replacements require sources be numbers
It seems worth the small amount of damage to give an extra cushion of
not having to debug problems later.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21043197 -> 21043359 (<.01%)
instructions in affected programs: 4409 -> 4571 (3.67%)
helped: 0
HURT: 25
HURT stats (abs)   min: 1 max: 16 x̄: 6.48 x̃: 5
HURT stats (rel)   min: 0.39% max: 15.38% x̄: 4.59% x̃: 4.40%
95% mean confidence interval for instructions value: 4.37 8.59
95% mean confidence interval for instructions %-change: 2.93% 6.26%
Instructions are HURT.

total cycles in shared programs: 856175986 -> 856176921 (<.01%)
cycles in affected programs: 58908 -> 59843 (1.59%)
helped: 0
HURT: 25
HURT stats (abs)   min: 7 max: 70 x̄: 37.40 x̃: 38
HURT stats (rel)   min: 0.27% max: 5.63% x̄: 1.87% x̃: 1.39%
95% mean confidence interval for cycles value: 31.11 43.69
95% mean confidence interval for cycles %-change: 1.35% 2.39%
Cycles are HURT.

No fossil-db changes on any Intel platform.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick d69ba58644 nir/algebraic: Remove some optimizations of comparisons with fsat
When most of these patterns were created, we believed, incorrectly, that
fsat(NaN) was NaN.  We have since realized that fsat(NaN) is zero.
Originally, this changed the patterns to use is_a_number.  This didn't
help any shaders, so it's easier to just drop the optimizations.

This commit crossed paths with 4c3ad4d065 ("nir/algebraic: mark more
optimization with fsat(NaN) as inexact") and bc123c396a
("nir/algebraic: mark some optimizations with fsat(NaN) as inexact").
Given that these don't impact very many shaders, it seems safer to just
remove them.

As discussed in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8716, I tried
modifying these patterns to use !(b cmp a).  Unfortunately, on Intel
GPUs, the results were much worse than just removing the patterns
altogether.

Some other related patterns will be addressed in later commits.

There are still a number of patterns that use the identity fsat(1-X) ==
1 - fsat(X).  If X is NaN, the former is zero while the latter is 1.0.
I haven't evaluted these patterns yet.  If changes are needed in these
patterns, it should be a separate commit anyway.

v2: Replace arrow `=>` with `->` in comments because the `=>` looks a
lot like `<=` comparison.  Suggested by Rhys.

Fixes: 92b75c126b ("nir/algebraic: Replace checks that a value is between (or not) [0, 1]")
Fixes: a7f0c57673 ("nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel hardware had similar results. (Ice Lake shown)
total instructions in shared programs: 20029060 -> 20029670 (<.01%)
instructions in affected programs: 69236 -> 69846 (0.88%)
helped: 0
HURT: 263
HURT stats (abs)   min: 1 max: 20 x̄: 2.32 x̃: 1
HURT stats (rel)   min: 0.30% max: 11.11% x̄: 1.35% x̃: 0.98%
95% mean confidence interval for instructions value: 1.86 2.78
95% mean confidence interval for instructions %-change: 1.18% 1.52%
Instructions are HURT.

total cycles in shared programs: 979821278 -> 979834425 (<.01%)
cycles in affected programs: 1476848 -> 1489995 (0.89%)
helped: 49
HURT: 204
helped stats (abs) min: 1 max: 812 x̄: 102.31 x̃: 20
helped stats (rel) min: 0.01% max: 21.43% x̄: 2.23% x̃: 0.52%
HURT stats (abs)   min: 2 max: 2600 x̄: 89.02 x̃: 16
HURT stats (rel)   min: 0.04% max: 27.27% x̄: 1.49% x̃: 0.72%
95% mean confidence interval for cycles value: 13.18 90.75
95% mean confidence interval for cycles %-change: 0.29% 1.25%
Cycles are HURT.

No fossil-db changes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Jason Ekstrand b447f5049b nir: Add a discard optimization pass
Many fragment shaders do a discard using relatively little information
but still put the discard fairly far down in the shader for no good
reason.  If the discard is moved higher up, we can possibly avoid doing
some or almost all of the work in the shader.  When this lets us skip
texturing operations, it's an especially high win.

One of the biggest offenders here is DXVK.  The D3D APIs have different
rules for discards than OpenGL and Vulkan.  One effective way (which is
what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
wait until the very end of the shader to discard.  This ends up in the
pessimal case where we always do all of the work before discarding.
This pass helps some DXVK shaders significantly.

v2 (Jason Ekstrand):
 - Fix a couple of typos (Grazvydas, Ian)
 - Use the new nir_instr_move helper
 - Find all movable discards before moving anything so we don't
   accidentally re-order anything and break dependencies

v3 (Pierre-Eric): remove the call to nir_opt_conditional_discard based
on Daniel Schürmann comment.

v4 (Pierre-Eric):
 - handle demote intrinsics and drop derivatives_safe_after_discard
 - add early return if discards/demotes aren't used

v5 (Pierre-Eric):
 - use pass_flags instead of instr set (Daniel Schürmann)

v6 (Daniel Schürmann):
 - cleanup and fix pass_flags handling

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10522>
2021-05-19 18:04:44 +00:00
Jason Ekstrand 3033410b10 nir/gather_info: Expose a nir_intrinsic_writes_external_memory helper
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10522>
2021-05-19 18:04:44 +00:00
Jason Ekstrand f97fb1fa55 nir: Add a nir_instr_move helper
Removes an instruction from one place and inserts it at another while
working around a weird cursor corner-case.

v2: change return value to bool (Daniel Schürmann)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10522>
2021-05-19 18:04:44 +00:00
Bas Nieuwenhuizen 2d6a6469b8 nir: Add bvh64_intersect_ray_amd intrinsic.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10818>
2021-05-18 23:01:47 +02:00
Bas Nieuwenhuizen aa82f91c38 nir: Add load_sbt_amd intrinsic.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9767>
2021-05-18 18:29:36 +00:00
Samuel Pitoiset 1b1c726ca9 nir/opt_access: fix getting variables in presence of similar bindings/desc
It's perfectly legal to declare multiple SSBOs that point to the same
binding/descriptor_set with different access mask. Currently, it will
always get the first one in the list that matches binding/desc_set
regardless of the access mask, but other variables might have different
access mask.

Fix this by being conservative if another variable uses the same
binding/desc_set because we can't get it reliably without adding
a new field to vulkan_resource_index.

This fixes rendering issues in Resident Evil Village with vkd3d-proton.
This bug has been uncovered by ("spirv: Don't remove variables used by
resource indexing intrinsics") because variables are no longer removed

No fossils-db changes.

Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10692>
2021-05-18 06:25:24 +00:00
Connor Abbott a40714abf7 nir/lower_phis_to_scalar: Add "lower_all" option
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>
2021-05-17 09:59:45 +00:00
Mike Blumenkrantz 6df187df13 nir/builder: add nir_pad_vector and nir_pad_vec4 util functions
these pad a given value to vec4 or arbitrary number of components

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10630>
2021-05-16 14:15:14 +00:00
Gert Wollny 4c045ad11e nir/linker: add option to ignore the IO precisions for better varying packing
Backends that don't handle IO component precision can pack more varyings
into one slot if the linker ignores the precision. If the IO is vectorized
then this can save IO instructions.

Related: 165a69d2f7
    nir: handle mediump varyings in varying compaction helpers

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10722>
2021-05-15 09:58:27 +02:00
Caio Marcelo de Oliveira Filho 09984fd02f nir: Rename nir_is_per_vertex_io to nir_is_arrayed_io
VS outputs are "per vertex" but not the kind of I/O we want to match
with this helper.  Change to a name that covers the "arrayness"
required by the type.

Name inspired by the GLSL spec definition of arrayed I/O.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10493>
2021-05-14 16:17:45 +00:00