Commit Graph

136 Commits

Author SHA1 Message Date
Jason Ekstrand e6a9501aa2 intel/fs: Add the URB fence message
When they re-arranged all the dataport stuff and added the LSC, doing
URB fencing through the dataport no longer makes sense.  Instead, there
is now a fence message on the URB shared function.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
2021-09-29 20:52:54 +00:00
Marcin Ślusarz e0533ebf16 intel/compiler: INT DIV function does not support source modifiers
BSpec says that for all generations.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5281
CC: mesa-stable

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12518>
2021-08-26 07:51:44 +00:00
Ian Romanick 0f809dbf40 intel/compiler: Basic support for DP4A instruction
v2: Very significant rebase on changes to previous commits.
Specifically, brw_fs_nir.cpp changes were pretty much rewritten from
scratch after changing the NIR opcode names and types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Jason Ekstrand 11ac7d9e02 intel/eu: Set scope to TILE for TGM flushes
Setting it to GPU can cause an L3$ flush in certain cases.  That's not
what we want as we really only care about coherency within the GPU.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12291>
2021-08-10 14:00:19 +00:00
Sagar Ghuge 705285b9f4 intel/compiler: Add support for ternary add instruction on XeHP
v2:
- Re-arragne opcode in correct order (Matt Turner)
- Move ADD3 case closer to LRP (Jason)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
2021-07-16 15:59:56 +00:00
Sagar Ghuge b67f1ff465 intel/compiler: Add support for LSC fence operations
v2 (Jason Ekstrand):
 - Squash SLM and global fence ops together

v3 (Jason Ekstrand):
 - Rework to use message descriptors instead of instruction fields

v4 (Jason Ekstrand):
 - Don't pass BTI into back-end emit function.  Always use FLAT.

Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
2021-06-30 16:17:18 +00:00
Jason Ekstrand 55508bbe66 intel/compiler: Generalize shader relocations a bit
This commit adds a delta to be added to the relocated value as well as
the possibility of multiple types of relocations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
2021-06-22 21:09:25 +00:00
Marcin Ślusarz 3340d5ee02 intel: simplify is_haswell checks, part 1
Generated with:

files=`git grep is_haswell | cut -d: -f1 | sort | uniq`
for file in $files; do
        cat $file | \
                sed "s/devinfo->ver <= 7 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" | \
                sed "s/devinfo->ver >= 8 || devinfo->is_haswell/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo->is_haswell || devinfo->ver >= 8/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo.is_haswell || devinfo.ver >= 8/devinfo.verx10 >= 75/g" | \
                sed "s/devinfo->ver > 7 || devinfo->is_haswell/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo->ver == 7 && !devinfo->is_haswell/devinfo->verx10 == 70/g" | \
                sed "s/devinfo.ver == 7 && !devinfo.is_haswell/devinfo.verx10 == 70/g" | \
                sed "s/devinfo->ver < 8 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" | \
                sed "s/device->info.ver == 7 && !device->info.is_haswell/device->info.verx10 == 70/g" \
                > tmpXXX
        mv tmpXXX $file
done

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10810>
2021-05-17 09:46:45 +00:00
Jason Ekstrand 94c1e65de9 intel/eu: Set message subtype properly for SIMD8 FB fetch
There were two bugs which crep in here as part of 64551610d1e6:
forgetting that exec sizes in HW are in log2 space and having the
exec_size condition for the subtype backwards.

Fixes: 64551610d1 "intel/compiler: rework message descriptors..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10588>
2021-05-03 15:30:41 +00:00
Jason Ekstrand 2e7656ae2f intel/eu: SVB writes only happen on Gen6
It's a Gen6 XFB thing.  It's never used for anything else so there's no
point in having a target cache switch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Lionel Landwerlin b6332fc4a8 intel/compiler: handle coarse pixel in render target writes descriptors
v2: Use the new inst->ex_desc field (Jason)

v3: Drop CPS LoD compensation from sampler messages (Lionel)

v4: Drop useless uses_rate_shading (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Lionel Landwerlin 64551610d1 intel/compiler: rework message descriptors for render targets
Render target message descriptors are slightly different from the
dataport ones. In particular the msg_type field is on bits 14:17 for
RT while bits 14:18 for DP.

v2: Drop unused send_commit_msg field in brw_fb_write_desc() (Ken)

v3: Rebase on top renaming (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Anuj Phogat 61e8636557 intel: Rename gen_device prefix to intel_device
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen_device" -rIl $SEARCH_PATH | xargs sed -ie "s/gen_device/intel_device/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>
2021-04-20 20:06:33 +00:00
Francisco Jerez 12479abded intel/fs: Implement representation of SWSB cross-pipeline synchronization annotations.
The execution units of XeHP platforms have multiple asynchronous ALU
pipelines instead of (as far as software is concerned) the single
in-order pipeline that handled most ALU instructions except for
extended math in the original Xe.  It's now the compiler's
responsibility to identify cross-pipeline dependencies and insert
synchronization annotations whenever necessary, which are encoded as
some additional bits of the SWSB instruction field.

This commit represents the cross-pipeline synchronization annotations
as part of the existing tgl_swsb structure used for codegen.  The
existing tgl_swsb_*() helpers used by hand-crafted assembly are
extended to default to TGL_PIPE_ALL big-hammer synchronization in
order to ensure backwards compatibility with the existing assembly.
The following commits will extend the software scoreboard lowering
pass in order to keep track of cross-pipeline dependencies across IR
instructions, and insert more specific pipeline annotations in the
SWSB field.

The disassembler is also extended here to print out any existing
pipeline sync annotations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
2021-04-16 08:27:34 +00:00
Michel Dänzer d200f45875 Use explicit break instead of fall-through to break-only case
clang generates a warning if there's no explicit break or fall-through
annotation. The latter would be kind of silly in this case, and not
robust against any future changes turning the fall-through invalid.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
2021-04-15 16:01:22 +00:00
Michel Dänzer 2928c21eb7 Convert most remaining free-form fall-through comments to FALLTHROUGH
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
2021-04-15 16:01:22 +00:00
Anuj Phogat 1d296484b4 intel: Rename Genx keyword to Gfxx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/Gen\([[:digit:]]\+\)/Gfx\1/g"

Exclude changes in src/intel/perf/oa-*.xml:
find src/intel/perf -type f \( -name "*.xml" \) | xargs sed -ie "s/Gfx/Gen/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat b75f095bc7 intel: Rename genx keyword to gfxx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/gen\([[:digit:]]\+\)/gfx\1/g"

Exclude pack.h and xml changes in this patch:
grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+_pack\.h\)/gen\1/g"
grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+\.xml\)/gen\1/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat c1f3a778de intel: Rename GENx prefix in macros to GFXx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN" -rIl src/intel/genxml | grep -E ".*py" |  xargs sed -ie "s/GEN\([%{]\)/GFX\1/g"
grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH | grep -E ".*(\.c|\.h|\.y|\.l)" | xargs sed -ie "s/\([^_]\)GEN\([[:digit:]]\+\)/\1GFX\2/g"

Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+":
grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH | xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat abe9a71a09 intel: Rename gen field in gen_device_info struct to ver
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "info\)*(.|->)gen" -rIl $SEARCH_PATH | xargs sed -ie "s/info\()*\)\(\.\|->\)gen/info\1\2ver/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Ian Romanick 6c8e2e9317 intel/compiler: Enable the ability to emit CMPN instructions
v2: Move checks to the EU validator.  Suggested by Jason.

Fixes: 2f2c00c727 ("i965: Lower min/max after optimization on Gen4/5.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9027>
2021-02-17 19:52:24 +00:00
Jason Ekstrand f3a43e36e0 intel/fs: Add an ex_desc field to fs_inst for SHADER_OPCODE_SEND
I meant to do this years ago when I first added SHADER_OPCODE_SEND.  At
the time, the only use for the extended descriptor was bindless handles
which were always one thing and never non-constant.  However, it doesn't
actually require any extra instructions because we have to OR in ex_mlen
anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8748>
2021-01-28 17:57:48 +00:00
Ian Romanick 92f08860c9 intel/compiler: Silence unused parameter warning in brw_surface_payload_size
src/intel/compiler/brw_eu_emit.c: In function ‘brw_surface_payload_size’:
src/intel/compiler/brw_eu_emit.c:3070:46: warning: unused parameter ‘p’ [-Wunused-parameter]
 3070 | brw_surface_payload_size(struct brw_codegen *p,
      |                          ~~~~~~~~~~~~~~~~~~~~^

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6826>
2020-09-28 11:43:04 -07:00
Jason Ekstrand 8d8a3815ef intel/eu: Add a mechanism for emitting relocatable constant MOVs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Jason Ekstrand 91348d125d intel/eu: Add some new helpers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Jason Ekstrand cccb497d3c intel/fs: Fix MOV_INDIRECT and BROADCAST of Q types on Gen11+
The immediate case is pretty uncommon to see but it can happen, in
theory.  BROADCAST is typically used to uniformize values and those are
usually 32-bit.  However, it does come up in some subgroup ops.

Fixes: 49c21802cb "intel/compiler: Split has_64bit_types into float/int"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6211>
2020-09-01 13:25:20 -05:00
Matt Turner c883c482be intel/compiler: Relax SENDS regioning assertions
The next commit fixes a mistake in the assembler and ends up running
afoul of this assertion.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5956>
2020-07-31 12:59:24 -07:00
Jason Ekstrand 561aaeeb48 intel/eu: Add the RNDU opcode
We don't want to use it on gen5 and earlier because only RNDD can be
done with a single instruction and we can implement RNDU(x) as -RNDD(-x)
so it's better to just do that when we have the instruction.  On gen6
and above, we may as well just use the right instruction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>
2020-06-23 17:43:54 +00:00
Jason Ekstrand e0ab48e3ea intel/eu: Set the right subnr for ALIGN16 destinations
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>
2020-06-23 17:43:54 +00:00
Jason Ekstrand c48f42e178 intel/fs: Emit HALT for discard on Gen4-5
Using HALT to immediately jump to the end of the shader is required to
implement GL_EXT_gpu_shader4 and OpenGL 3.0.  However, vanilla OpenGL
1.2 doesn't forbid it and it likely makes something somewhere faster.
We should be consistent and implement the same discard behavior on all
hardware if we can.

The rules for HALT on Gen4-5 are a bit different from Gen6+.  On the
older hardware, there is no stack for HALT; instead it's up to software
to save and restore mask registers.  However, there's no real saving
needed since we only use HALT to jump to the end of the program where
we're about about to do our FB writes.  All we need to do is reset AMask
to DMask, the value it was initialized to at the start of the thread.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5244>
2020-05-30 06:21:15 +00:00
Caio Marcelo de Oliveira Filho f858fa26b4 intel/fs,vec4: Pull stall logic for memory fences up into the IR
Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.

For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).

Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics.  The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).

This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.

Shader-db results for IVB (i965):

    total instructions in shared programs: 11971190 -> 11971200 (<.01%)
    instructions in affected programs: 11482 -> 11492 (0.09%)
    helped: 0
    HURT: 8
    HURT stats (abs)   min: 1 max: 3 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
    95% mean confidence interval for instructions value: 0.66 1.84
    95% mean confidence interval for instructions %-change: 0.01% 0.27%
    Instructions are HURT.

  Unlike the previous code, that used the `mov g1 g2` trick to force
  both `g1` and `g2` to stall, the scheduling fence will generate `mov
  null g1` and `mov null g2`.  During review it was decided it was not
  worth keeping the special codepath for the small effect will have.

Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below

    total cycles in shared programs: 341738444 -> 341710570 (<.01%)
    cycles in affected programs: 7240002 -> 7212128 (-0.38%)
    helped: 46
    HURT: 5
    helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
    helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
    HURT stats (abs)   min: 2 max: 1768 x̄: 646.40 x̃: 362
    HURT stats (rel)   min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
    95% mean confidence interval for cycles value: -777.71 -315.38
    95% mean confidence interval for cycles %-change: -1.42% -0.83%
    Cycles are helped.

  This seems to be the effect of allocating two registers separatedly
  instead of a single one with size 2, which causes different register
  allocation, affecting the cycle estimates.

while ICL also has not change on instruction count but report changes
negative changes in cycles

    total cycles in shared programs: 352665369 -> 352707484 (0.01%)
    cycles in affected programs: 9608288 -> 9650403 (0.44%)
    helped: 4
    HURT: 104
    helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
    helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
    HURT stats (abs)   min: 2 max: 2016 x̄: 408.36 x̃: 48
    HURT stats (rel)   min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
    95% mean confidence interval for cycles value: 256.67 523.24
    95% mean confidence interval for cycles %-change: 0.63% 1.03%
    Cycles are HURT.

  AFAICT this is the result of the case above.

Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions

    total instructions in shared programs: 17690586 -> 17690597 (<.01%)
    instructions in affected programs: 64617 -> 64628 (0.02%)
    helped: 55
    HURT: 32
    helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
    helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
    HURT stats (abs)   min: 1 max: 65 x̄: 7.44 x̃: 2
    HURT stats (rel)   min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
    95% mean confidence interval for instructions value: -2.03 2.28
    95% mean confidence interval for instructions %-change: -0.41% 0.15%
    Inconclusive result (value mean confidence interval includes 0).

  Now that more is done in the IR, more dependencies are visible and
  more SWSB annotations are emitted.  Mixed with different register
  allocation decisions like above, some shaders will see more `sync
  nops` while others able to avoid them.

  Most of the new `sync nops` are also redundant and could be dropped,
  which will be fixed in a separate change.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Dylan Baker f8e4542bad replace _mesa_logbase2 with util_logbase2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
2020-04-21 11:09:03 -07:00
Caio Marcelo de Oliveira Filho c76f2292b5 intel/fs,vec4: Properly account SENDs in IVB memory fence
Change brw_memory_fence to return the number of messages emitted, and
use that to update the send_count statistic in code generation.

This will fix the book-keeping for IVB since the memory fences will
result in two SEND messages.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4646>
2020-04-20 09:29:09 -07:00
Jason Ekstrand 3b2eafbea9 intel/fs: Don't unnecessarily fall back to indirect sends on Gen12
The instruction encoding for SENDS changed on Gen12 and it now supports
embedding the entire extended message descriptor in the instruction if
it's an immediate.  Stop falling back to doing an indirect SEND just
because we had something in [15:12] of ex_desc.ud.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
2020-01-24 19:18:27 +00:00
Matt Turner 88a0523bd2 intel/compiler: Move Gen4/5 rounding to visitor
Gen4/5's rounding instructions operate differently than later Gens'.
They all return the floor of the input and the "Round-increment"
conditional modifier answers whether the result should be incremented by
1.0 to get the appropriate result for the operation (and thus its
behavior is determined by the round opcode; e.g., RNDZ vs RNDE).

Since this requires a second instruciton (a predicated ADD) that
consumes the result of the round instruction, the round instruction
cannot write its result directly to the (write-only) message registers.
By emitting the ADD in the generator, the backend thinks it's safe to
store the round's result directly to the message register file.

To avoid this, we move the emission of the ADD instruction to the NIR
translator so that the backend has the information it needs.

I suspect this also fixes code generated for RNDZ.SAT but since
Gen4/5 don't support GLSL 1.30 which adds the trunc() function, I
couldn't write a piglit test to confirm. My thinking is that if x=-0.5:

      sat(trunc(-0.5)) = 0.0

But on Gen4/5 where sat(trunc(x)) is implemented as

      rndz.r.f0  result, x             // result = floor(x)
                                       // set f0 if increment needed
      (+f0) add  result, result, 1.0   // fixup so result = trunc(x)

then putting saturate on both instructions will give the wrong result.

      floor(-0.5) = -1.0
      sat(floor(-0.5)) = 0.0
      // +1 increment would be needed since floor(-0.5) != trunc(-0.5)
      sat(sat(floor(-0.5)) + 1.0) = 1.0

Fixes: 6f394343b1 ("nir/algebraic: i2f(f2i()) -> trunc()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2355
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
2020-01-22 23:47:02 +00:00
Iván Briano ca94717035 intel/compiler: Don't change hstride if not needed
Alignment requirements may have changed the horizontal stride already,
so don't set it if not required to avoid breaking said requirements.

Fixes several tests such as
dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t

Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-18 14:19:41 -08:00
Sagar Ghuge bf943bdf24 intel/compiler: Set bits according to source file
On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge c018c5a339 intel/compiler: Add Immediate support for 3 source instruction
On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Francisco Jerez 6b52f81395 intel/eu: Don't set notify descriptor field of gateway barrier message.
Apparently this field was removed on SKL, and according to the
hardware docs for previous platforms "This field is only valid for a
ForwardMsg message. It is ignored for other messages. The BarrierMsg
message always increments the N0 notification counter".

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez 15e3a0d9d2 intel/eu/gen12: Set SWSB annotations in hand-crafted assembly.
Reviewers are encouraged to audit the code generation pass
independently for the case I missed some potential data hazard or new
code has been added in the meantime.

v2: Add SYNC instruction to cr0 workaround in brw_float_controls_mode().

v3: Drop likely redundant (and potentially harmful) RegDist SWSB
    annotation from ce0 read in brw_find_live_channel() (Caio).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez d3f3bdcd18 intel/eu/gen12: Add tracking of default SWSB state to the current brw_codegen instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez c22db5e188 intel/fs/gen12: Add codegen support for the SYNC instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez 7499e10383 intel/eu/gen12: Don't set thread control, it's gone.
An effect similar to the one formerly provided by setting thread
control to "switch" can be achieved now by setting a RegDist of 1 on
the SWSB field.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez a66ea33991 intel/eu/gen12: Don't set DD control, it's gone.
A future lowering pass will simulate the same behavior originally
provided by NoDDChk/NoDDClr at the IR level by using appropriate SWSB
annotations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez 8a5fad0d92 intel/eu/gen12: Use SEND instruction for split sends.
The new SEND instruction behaves like the former SENDS instruction.
The original single-payload SEND instruction is gone.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez 6634ede7aa intel/eu/gen12: Codegen SEND descriptor regions correctly.
The SEND instruction is now four-source.  The descriptor is no longer
part of source 1, so avoid touching it to avoid corruption while
initializing the descriptor.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez 2c4c9aba30 intel/eu/gen12: Codegen pathological SEND source and destination regions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez bafc9515db intel/eu/gen12: Codegen control flow instructions correctly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez 6e1daba3b4 intel/eu/gen12: Codegen three-source instruction source and destination regions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez 9fdb67aa09 intel/eu/gen12: Fix codegen of immediate source regions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00