Commit Graph

179 Commits

Author SHA1 Message Date
Jason Ekstrand 278d12f991 intel/fs,vec4: Drop prog_data binding tables
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand 4fa58d27a5 intel/fs,vec4: Drop support for shader time
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Sagar Ghuge f78e33aa1a intel/compiler: Set correct return format for brw_SAMPLE
on GFX8 onwards, we have only single bit to determine correct return
format.

v2:
- Define macro and use it instead of hardcoded value. (Lionel)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
2021-11-22 21:27:30 -08:00
Lionel Landwerlin 361b3fee3c intel: move away from booleans to identify platforms
v2: Drop changes around GFX_VERx10 == 75 (Luis)

v3: Replace
   (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT)
   by
   (devinfo->platform == INTEL_PLATFORM_IVB)
   Replace
   (devinfo->ver >= 5 || devinfo->platform == INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 >= 45)
   Replace
   (devinfo->platform != INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 != 45)

v4: Fix crocus typo

v5: Rebase

v6: Add GFX3, ILK & I965 platforms (Jordan)
    Move ifdef to code expressions (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>
2021-11-08 16:48:06 +00:00
Lionel Landwerlin 4e4560ab6f intel/compiler: add missing line returns to logs
In the upcoming intel_clc tool, we're allowing to print these messages
out and some of them just don't look right.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13165>
2021-10-05 07:31:52 +00:00
Ian Romanick 0f809dbf40 intel/compiler: Basic support for DP4A instruction
v2: Very significant rebase on changes to previous commits.
Specifically, brw_fs_nir.cpp changes were pretty much rewritten from
scratch after changing the NIR opcode names and types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Ian Romanick 043c5bf966 intel/compiler: Add id parameter to shader_debug_log callback
There are two problems with the current architecture.

In OpenGL, the id is supposed to be a unique identifier for a particular
log source.  This is done so that applications can (theoretically)
filter particular log messages.  The debug callback infrastructure in
Mesa assigns a uniqe value when a value of 0 is passed in.  This causes
the id to get set once to a unique value for each message.

By passing a stack variable that is initialized to 0 on every call,
every time the same message is logged, it will have a different id.
This isn't great, but it's also not catastrophic.

When threaded shader compiles are used, the id *pointer* is saved and
dereferenced at a possibly much later time on a possibly different
thread.  This causes one thread to access the stack from a different
thread... and that stack frame might not be valid any more. :(

This fixes shader-db crashes of various kinds on Iris with threaded
shader compiles enabled.

Fixes: 42c34e1ac8 ("iris: Enable threaded shader compilation")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12136>
2021-08-01 23:58:08 +00:00
Sagar Ghuge 705285b9f4 intel/compiler: Add support for ternary add instruction on XeHP
v2:
- Re-arragne opcode in correct order (Matt Turner)
- Move ADD3 case closer to LRP (Jason)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
2021-07-16 15:59:56 +00:00
Francisco Jerez 4dc4284342 intel/fs: Implement Wa_14013745556 on TGL+.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
2021-06-23 07:34:22 +00:00
Francisco Jerez c19cfa9dc2 intel/fs: Fix synchronization of accumulator-clearing W/A move on TGL+.
Right now the accumulator-clearing move emitted by the generator for
Wa_14010017096 inherits the SWSB field from the previous instruction.
This can lead to redundant synchronization, or possibly more serious
issues if the previous instruction had a TGL_SBID_SET SWSB
synchronization mode.  Take the SWSB synchronization information from
the IR.

Fixes: a27542c5dd ("intel/compiler: Clear accumulator register before EOT")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
2021-06-23 07:34:22 +00:00
Jason Ekstrand 705395344d intel/fs: Add support for compiling bindless shaders with resume shaders
Instead of depending on the driver to compile each resume shader
separately, we compile them all in one go in the back-end and build an
SBT as part of the shader program.  Shader relocs are used to make the
entries in the SBT point point to the correct resume shader.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
2021-06-22 21:09:25 +00:00
Marcin Ślusarz 3340d5ee02 intel: simplify is_haswell checks, part 1
Generated with:

files=`git grep is_haswell | cut -d: -f1 | sort | uniq`
for file in $files; do
        cat $file | \
                sed "s/devinfo->ver <= 7 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" | \
                sed "s/devinfo->ver >= 8 || devinfo->is_haswell/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo->is_haswell || devinfo->ver >= 8/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo.is_haswell || devinfo.ver >= 8/devinfo.verx10 >= 75/g" | \
                sed "s/devinfo->ver > 7 || devinfo->is_haswell/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo->ver == 7 && !devinfo->is_haswell/devinfo->verx10 == 70/g" | \
                sed "s/devinfo.ver == 7 && !devinfo.is_haswell/devinfo.verx10 == 70/g" | \
                sed "s/devinfo->ver < 8 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" | \
                sed "s/device->info.ver == 7 && !device->info.is_haswell/device->info.verx10 == 70/g" \
                > tmpXXX
        mv tmpXXX $file
done

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10810>
2021-05-17 09:46:45 +00:00
Jason Ekstrand 34c560ae95 intel/fs: Stop using brw_dp_read/write_desc in Gen7+ only code
Those helpers exist primarily to sort out some of the weirdness around
Gen4-6 dataport access.  On Gen5 and earlier, everything was called
"dataport" and, instead of the SFID we have today there was a "target
cache" parameter in the descriptor.  There are also some bits that moved
around on various gens depending on read vs. write.  Starting with Gen6,
most things which target one of the data cache SFIDs should use
brw_dp_desc() instead.

v2: Drop backward comment (Ken)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Lionel Landwerlin 6d4070f3dd intel/compiler: add support for fragment coordinate with coarse pixels
v2: Drop new internal opcodes (Jason)
    Simplify code (Jason)

v3: Add Z computation for coarse pixels

v4: Document things a little

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Lionel Landwerlin b6332fc4a8 intel/compiler: handle coarse pixel in render target writes descriptors
v2: Use the new inst->ex_desc field (Jason)

v3: Drop CPS LoD compensation from sampler messages (Lionel)

v4: Drop useless uses_rate_shading (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Anuj Phogat 61e8636557 intel: Rename gen_device prefix to intel_device
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen_device" -rIl $SEARCH_PATH | xargs sed -ie "s/gen_device/intel_device/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>
2021-04-20 20:06:33 +00:00
Jordan Justen 515ee73b4e intel/fs: End computer shader with message gateway on XeHP.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
2021-04-16 08:27:35 +00:00
Francisco Jerez 262b647b25 intel/compiler: Lower integer division on XeHP.
It has been removed from the hardware.

[jordan.l.justen@intel.com: Move to brw_postprocess_nir]

v2: Switch to nir_lower_idiv_precise (Rhys).
v3: Fix for interface changes of nir_lower_idiv.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
2021-04-16 08:27:35 +00:00
Francisco Jerez 05cce1f97d intel/fs: Use CHV/BXT implementation of 64-bit MOV_INDIRECT on XeHP+.
According to the hardware spec "Vx1 and VxH indirect addressing for
Float, Half-Float, Double-Float and Quad-Word data must not be used."

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
2021-04-16 08:27:35 +00:00
Michel Dänzer 2928c21eb7 Convert most remaining free-form fall-through comments to FALLTHROUGH
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
2021-04-15 16:01:22 +00:00
Anuj Phogat f96c3b8b63 intel: Rename GEN:BUG:### to Wa_###
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN:BUG:" -rIl $SEARCH_PATH | xargs sed -ie "s/GEN\(:BUG:\)/Wa_/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat e7e55af4d6 intel: Rename GENx keyword to GFXx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/GEN\([[:digit:]]\+\)/GFX\1/g"

Exclude the changes to modifiers:
grep -E "I915_.*GFX" -rIl $SEARCH_PATH | xargs sed -ie "s/\(I915_.*\)GFX/\1GEN/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat 1d296484b4 intel: Rename Genx keyword to Gfxx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/Gen\([[:digit:]]\+\)/Gfx\1/g"

Exclude changes in src/intel/perf/oa-*.xml:
find src/intel/perf -type f \( -name "*.xml" \) | xargs sed -ie "s/Gfx/Gen/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat b75f095bc7 intel: Rename genx keyword to gfxx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/gen\([[:digit:]]\+\)/gfx\1/g"

Exclude pack.h and xml changes in this patch:
grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+_pack\.h\)/gen\1/g"
grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+\.xml\)/gen\1/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat c1f3a778de intel: Rename GENx prefix in macros to GFXx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN" -rIl src/intel/genxml | grep -E ".*py" |  xargs sed -ie "s/GEN\([%{]\)/GFX\1/g"
grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH | grep -E ".*(\.c|\.h|\.y|\.l)" | xargs sed -ie "s/\([^_]\)GEN\([[:digit:]]\+\)/\1GFX\2/g"

Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+":
grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH | xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat abe9a71a09 intel: Rename gen field in gen_device_info struct to ver
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "info\)*(.|->)gen" -rIl $SEARCH_PATH | xargs sed -ie "s/info\()*\)\(\.\|->\)gen/info\1\2ver/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Ian Romanick 6c8e2e9317 intel/compiler: Enable the ability to emit CMPN instructions
v2: Move checks to the EU validator.  Suggested by Jason.

Fixes: 2f2c00c727 ("i965: Lower min/max after optimization on Gen4/5.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9027>
2021-02-17 19:52:24 +00:00
Ian Romanick b0d7434c71 intel/eu/validate: Add some checks for CMP and CMPN
These checks were originally assertions elsewhere either in the existing
code or later in this MR.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9027>
2021-02-17 19:52:24 +00:00
Jason Ekstrand 3ce6ca7214 intel/fs: Shuffle can't handle source modifiers
On Gen7, we have to split shuffles into two MOVs for 64-bit types so we
can't handle source modifiers.  On Gen12.5, we have to use integer types
all the time so we can't use them there either.  Fixing that will be a
different commit but it interacts with this one.

Fixes: 90c9f29518 "i965/fs: Add support for nir_intrinsic_shuffle"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9068>
2021-02-17 03:59:25 +00:00
Jason Ekstrand f3a43e36e0 intel/fs: Add an ex_desc field to fs_inst for SHADER_OPCODE_SEND
I meant to do this years ago when I first added SHADER_OPCODE_SEND.  At
the time, the only use for the extended descriptor was bindless handles
which were always one thing and never non-constant.  However, it doesn't
actually require any extra instructions because we have to OR in ex_mlen
anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8748>
2021-01-28 17:57:48 +00:00
Jason Ekstrand c80db6611a intel/fs: Support 64-bit CLUSTER_BROADCAST on Gen11+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
2021-01-22 18:38:38 +00:00
Jason Ekstrand b90921ec0c intel/fs: Support 64-bit SHUFFLE on Gen11+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
2021-01-22 18:38:38 +00:00
Jason Ekstrand cdedc82329 intel/fs: Support 64-bit SEL_EXEC on Gen11+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
2021-01-22 18:38:37 +00:00
Jason Ekstrand f9d549b2bf intel/fs: Use BRW_OPCODE_HALT for discards
We're about to start using it to implement nir_jump_halt which has
nothing inherently to do with fragment shaders or discards.  May as well
name it for the HW instruction it generates.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
2020-12-01 16:19:08 -06:00
Jason Ekstrand e76e359007 intel/fs: Rename PLACEHOLDER_HALT to HALT_TARGET
It's a bit more explicit and will play more nicely with what we're about
to do.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
2020-12-01 16:18:50 -06:00
Jason Ekstrand 7280b0911d intel/compiler: Add support for bindless shaders
The Intel bindless thread dispatch model is very simple.  When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs.  These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned.  Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke.  When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers.  The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:09 +00:00
Ian Romanick 262ca98b3a intel/compiler: Remove Gen10-specific code
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
2020-10-15 09:29:53 -07:00
Jason Ekstrand 06ebf23283 intel/fs: Add a SCRATCH_HEADER opcode
This opcode is responsible for setting up the buffer base address and
per-thread scratch space fields of a scratch message header.  For the
most part, it's a copy of g0 but some messages need us to zero out g0.2
and the bottom bits of g0.5.

This may actually fix a bug when nir_load/store_scratch is used.  The
docs say that the DWORD scattered messages respect the per-thread
scratch size specified in gN.3[3:0] in the message header but we've been
leaving it zero.  This may mean that we've been ignoring any scratch
reads/writes from a load/store_scratch intrinsic above the 1KB mark.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Jason Ekstrand 8427e56067 intel/fs: Don't use NoDDClk/NoDDClr for split SHUFFLEs
When I copied and pasted the code from MOV_INDIRECT for handling the
dependency controls, I missed a subtle difference between MOV_INDIRECT
and SHUFFLE.  Specifically, MOV_INDIRECT gets lowered to a narrow
instruction on Gen7 by the SIMD width lowering whereas SHUFFLE has to
split it in the generator.  Therefore, the check safety check for
whether or not we can use dependency control has to be based on the
lowered width rather than the width of the original instruction.

Fixes: a8ac61b0ee "intel/fs: NoMask initialize the address..."
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3593
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6989>
2020-10-02 19:53:56 +00:00
Jason Ekstrand a8ac61b0ee intel/fs: NoMask initialize the address register for shuffles
Cc: mesa-stable@lists.freedesktop.org
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2979
Tested-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6825>
2020-10-02 00:42:56 +00:00
Marcin Ślusarz 5ea0b6a9c6 intel/compiler: initialize remaining fields of various classes
These variables seem to be initialized before being used, so this
patch is not fixing any bug, but leaving them unitialized may become
a bug after some refactoring.

These classes were affected: fs_reg_alloc, fs_visitor, fs_generator,
instruction_scheduler.

Found by Coverity.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>
2020-09-10 12:16:58 +00:00
Marcin Ślusarz 663c4d5377 intel/fs: add hint how to get more info when shader validation fails
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6559>
2020-09-04 12:09:22 +00:00
Jason Ekstrand 91becd84ae intel/fs: Add support for a new load_reloc_const intrinsic
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Jason Ekstrand 90b6745bc8 intel/fs,vec4: Stuff the constant data from NIR in the end of the program
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Jason Ekstrand 91348d125d intel/eu: Add some new helpers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Danylo Piliaiev bc4a127d6e intel/disasm: Label support in shader disassembly for UIP/JIP
Shader instructions which use UIP/JIP now get formatted with a label
in addition with immediate value, labels have "LABEL%d" format.

v2: - Consider brw_jump_scale when calculating label's offset

From: "Lonnberg, Toni" <toni.lonnberg@intel.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4245>
2020-09-02 10:33:29 +00:00
Jason Ekstrand cccb497d3c intel/fs: Fix MOV_INDIRECT and BROADCAST of Q types on Gen11+
The immediate case is pretty uncommon to see but it can happen, in
theory.  BROADCAST is typically used to uniformize values and those are
usually 32-bit.  However, it does come up in some subgroup ops.

Fixes: 49c21802cb "intel/compiler: Split has_64bit_types into float/int"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6211>
2020-09-01 13:25:20 -05:00
Jason Ekstrand c48f42e178 intel/fs: Emit HALT for discard on Gen4-5
Using HALT to immediately jump to the end of the shader is required to
implement GL_EXT_gpu_shader4 and OpenGL 3.0.  However, vanilla OpenGL
1.2 doesn't forbid it and it likely makes something somewhere faster.
We should be consistent and implement the same discard behavior on all
hardware if we can.

The rules for HALT on Gen4-5 are a bit different from Gen6+.  On the
older hardware, there is no stack for HALT; instead it's up to software
to save and restore mask registers.  However, there's no real saving
needed since we only use HALT to jump to the end of the program where
we're about about to do our FB writes.  All we need to do is reset AMask
to DMask, the value it was initialized to at the start of the thread.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5244>
2020-05-30 06:21:15 +00:00
Caio Marcelo de Oliveira Filho f858fa26b4 intel/fs,vec4: Pull stall logic for memory fences up into the IR
Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.

For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).

Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics.  The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).

This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.

Shader-db results for IVB (i965):

    total instructions in shared programs: 11971190 -> 11971200 (<.01%)
    instructions in affected programs: 11482 -> 11492 (0.09%)
    helped: 0
    HURT: 8
    HURT stats (abs)   min: 1 max: 3 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
    95% mean confidence interval for instructions value: 0.66 1.84
    95% mean confidence interval for instructions %-change: 0.01% 0.27%
    Instructions are HURT.

  Unlike the previous code, that used the `mov g1 g2` trick to force
  both `g1` and `g2` to stall, the scheduling fence will generate `mov
  null g1` and `mov null g2`.  During review it was decided it was not
  worth keeping the special codepath for the small effect will have.

Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below

    total cycles in shared programs: 341738444 -> 341710570 (<.01%)
    cycles in affected programs: 7240002 -> 7212128 (-0.38%)
    helped: 46
    HURT: 5
    helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
    helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
    HURT stats (abs)   min: 2 max: 1768 x̄: 646.40 x̃: 362
    HURT stats (rel)   min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
    95% mean confidence interval for cycles value: -777.71 -315.38
    95% mean confidence interval for cycles %-change: -1.42% -0.83%
    Cycles are helped.

  This seems to be the effect of allocating two registers separatedly
  instead of a single one with size 2, which causes different register
  allocation, affecting the cycle estimates.

while ICL also has not change on instruction count but report changes
negative changes in cycles

    total cycles in shared programs: 352665369 -> 352707484 (0.01%)
    cycles in affected programs: 9608288 -> 9650403 (0.44%)
    helped: 4
    HURT: 104
    helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
    helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
    HURT stats (abs)   min: 2 max: 2016 x̄: 408.36 x̃: 48
    HURT stats (rel)   min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
    95% mean confidence interval for cycles value: 256.67 523.24
    95% mean confidence interval for cycles %-change: 0.63% 1.03%
    Cycles are HURT.

  AFAICT this is the result of the case above.

Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions

    total instructions in shared programs: 17690586 -> 17690597 (<.01%)
    instructions in affected programs: 64617 -> 64628 (0.02%)
    helped: 55
    HURT: 32
    helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
    helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
    HURT stats (abs)   min: 1 max: 65 x̄: 7.44 x̃: 2
    HURT stats (rel)   min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
    95% mean confidence interval for instructions value: -2.03 2.28
    95% mean confidence interval for instructions %-change: -0.41% 0.15%
    Inconclusive result (value mean confidence interval includes 0).

  Now that more is done in the IR, more dependencies are visible and
  more SWSB annotations are emitted.  Mixed with different register
  allocation decisions like above, some shaders will see more `sync
  nops` while others able to avoid them.

  Most of the new `sync nops` are also redundant and could be dropped,
  which will be fixed in a separate change.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho 0e96b0d6dd intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers
It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00