Commit Graph

114045 Commits

Author SHA1 Message Date
Alyssa Rosenzweig 1637a53890 pan/midgard: Print invert modifier
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig 62a5ee3bb4 pan/midgard: Flip conditionals
We would like to flip ops to have a constant in the second place to
enable inlining of the constant.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig d066ca3575 pan/midgard: Add bitwise src/invert fusing
De Morgan's Laws and some special ops basically.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig 620c2717cf pan/midgard: Add .not propagation pass
Essentially .pos propagation but for bitwise.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig b821e1b85e pan/midgard: Fuse invert into bitwise ops
We use the new invert flag to produce ops like inand.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Jonathan Marek d8584c5cf2 freedreno: a2xx: implement texture tiling
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek fb5c3db0ab freedreno: a2xx: use nir_lower_alu_to_scalar instead of lowering pass
nir_lower_alu_to_scalar can now be used to only lower certain ops, so we
don't need the custom pass. And we can lower fall_equal/fany_nequal with
lower_vector_cmp instead.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek e652ca4e0b freedreno: a2xx: fix HW binning for batches with >256K vertices
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek 257957b026 freedreno: a2xx: fix fneg/fabs/fsat opcodes
Previously we would get a fmov with modifiers, but now that mov has no type
these opcodes need to be supported.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek 43dbd7d603 freedreno: a2xx: fix order of NIR opts
int_to_float needs to come after bool_to_float, and lower_to_source_mods
needs to come after both, since they don't deal wih source mods.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek 57e980a4fb freedreno: a2xx: fix non-etc1 cubemaps
Not sure how this happened, but apparently all cubemaps need swapped XY.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek 2e029acbe2 freedreno: a2xx: fix fast clear not being used for Z24X8 buffers
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek e25388c97b freedreno: align renderonly scanout buffers
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Eric Engestrom 6125c93e00 gitlab-ci: just build all the tools
This line was mistakenly added while there is already a `-D tools=all`
a few lines below.

Fixes: f60defa72d ("gitlab-ci: Add a shader-db run using v3d on drm-shim.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 16:41:19 +01:00
Sergii Romantsov a86eccfb78 i965/clear: clear_value better precision
Test-case with depth-clear 0.5 and format
MESA_FORMAT_Z24_UNORM_X8_UINT fails due inconsistent
clear-value of 0.4999997.
Maybe its better to improve?

CC: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 0ae9ce0f29 (i965/clear: Quantize the depth clear value based on the format)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111113
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-02 14:25:34 +00:00
Samuel Pitoiset e8110e51c6 radv: fix image_has_{cmask,fmask}() helpers
The driver should now rely on cmask_offset because CMASK can be
disabled by the driver for some reasons (eg. mipmaps). Apply the
same change for FMASK, although it should be useless.

Fixes: ad1bc8621d ("radv: remove radv_get_image_fmask_info()")
Fixes: 10d08da52c ("radv/gfx10: add missing dcc_tile_swizzle tweak")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 14:00:50 +02:00
Samuel Pitoiset ad1bc8621d radv: remove radv_get_image_fmask_info()
It's unnecessary to duplicate fields in another struct.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:46 +02:00
Samuel Pitoiset 10d08da52c radv/gfx10: add missing dcc_tile_swizzle tweak
Fixes: c90f46700d ("radv/gfx10: mask DCC tile swizzle by alignment")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:43 +02:00
Samuel Pitoiset 9c9745e8dd radv: remove radv_get_image_cmask_info()
It's unnecessary to duplicate fields in another struct.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:41 +02:00
Samuel Pitoiset 856487a280 radv: only account for tile_swizzle for color surfaces with DCC
It's 0 for depth surfaces with TC compat HTILE enabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:39 +02:00
Bas Nieuwenhuizen e1c5d8a364 radv: Enable VK_KHR_shader_atomic_int64
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 12:26:32 +02:00
Bas Nieuwenhuizen a17f2206d3 ac/nir: Implement LLVM9 64-bit buffer compare & exchange.
LLVM 9 does not have a 64-bit buffer compswap intrinsic, so this
extracts the ptr, does a bound check and then uses a cmpxchg LLVM
instruction.

Not ideal, but the earliest release we're going to get a proper
intrinsic is LLVM 10.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 12:26:11 +02:00
Connor Abbott 73274c9ec2 Revert "ac/nir: handle negate modifier"
This reverts commit bfea7e4d29.
2019-08-02 11:14:50 +02:00
Connor Abbott 4a382d66ee Revert "ac/nir: handle abs modifier"
This reverts commit d3c80733cd.

These were only appearing due to memory corruption.
2019-08-02 11:14:08 +02:00
Timothy Arceri 06ec14d692 iris: bump compat profile support to 4.6
All of the current piglit compat profile tests pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-02 18:56:53 +10:00
Timothy Arceri 74f96b06d6 egl: fix OpenGL 3.1 context creation
>From the EGL_KHR_create_context spec:

   "* If OpenGL 3.1 is requested, the context returned may implement
       any of the following versions:

         * Version 3.1. The GL_ARB_compatibility extension may or may
           not be implemented, as determined by the implementation.
         * The core profile of version 3.2 or greater."

Fixes CTS tests:

    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_stencil

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-02 18:56:53 +10:00
Connor Abbott f41516bdb5 nir/find_array_copies: Reject copies with mismatched type
When we detect a scalar/vector copy through load_deref/store_deref, we
have to be careful since those can bitcast an int to a float and
vice-versa even though copy_deref can't.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251
Fixes: 156306e5e6 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 10:34:29 +02:00
Samuel Pitoiset 7368000868 radv: re-apply "Optimize rebinding the same descriptor set."
This makes it cheaper to just change the dynamic offsets with
the same descriptor sets.

This optimization has been reverted a while back because of
random GPU hangs on GFX9, no it looks fine, at least CTS no longer
hangs on GFX9 and it doesn't hang on GFX10 as well.

It fixes a performance problem with Wolfenstein Youngblood.

Suggested-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2019-08-02 09:56:55 +02:00
Samuel Pitoiset 96a5445559 radv/gfx10: use the correct target machine for Wave32
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:38 +02:00
Samuel Pitoiset 8a86908e9a radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders
It can be enabled with RADV_PERFTEST=gewave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:36 +02:00
Samuel Pitoiset 953bbacc23 radv/gfx10: add Wave32 support for fragment shaders
It can be enabled with RADV_PERFTEST=pswave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:34 +02:00
Kenneth Graunke 18c2e09dc7 gallium: Implement GL_EXT_shader_samples_identical via a new capability
This exposes the textureSamplesIdenticalEXT function in GLSL.

We enable it for iris and radeonsi, because their compilers already
have support for this.  Tested on Intel Kabylake and AMD Vega 64.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 23:38:54 -07:00
Kenneth Graunke adcc0a8fdc intel/tools: Fix aubinator_viewer build.
This functions was recently renamed and not all callers were updated.

Fixes: 086c486a75 ("intel/device: rename gen_get_device_info")
2019-08-01 23:36:41 -07:00
Francisco Jerez 54fbc625ea intel/ir: Fix CFG corruption in opt_predicated_break().
Specifically the optimization of a conditional BREAK + WHILE sequence
into a conditional WHILE seems pretty broken.  The list of successors
of "earlier_block" (where the conditional BREAK was found) is emptied
and then re-created with the same edges for no apparent reason.  On
top of that the list of predecessors of the block immediately after
the WHILE loop is emptied, but only one of the original edges will be
added back, which means that potentially several blocks that still
have it on their list of successors won't be on its list of
predecessors anymore, causing all sorts of hilarity due to the
inconsistency in the control flow graph.

The solution is to remove the code that's removing valid edges from
the CFG.  cfg_t::remove_block() will already clean up after itself.
The assert in bblock_t::combine_with() also needs to be removed since
we will be merging a block with multiple children into the first one
of them.

Found the issue on a hardware enabling branch originally, but
apparently somebody reproduced the same problem independently on
master in the meantime.

Fixes: d13bcdb3a9 ("i965/fs: Extend predicated break pass to predicate WHILE.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111009
Cc: jiradet.jd@gmail.com
Cc: Sergii Romantsov <sergii.romantsov@globallogic.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Paul Chelombitko <qamonstergl@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-01 16:56:48 -07:00
Mark Janes ddb59cd20e intel/device: make internal functions private
The device info initializer makes several fuctions internal:

  - handling of device override
  - updating topology from kernel information

The implementation file is slightly reordered due to the renamed
functions being static.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:40:03 -07:00
Mark Janes 086c486a75 intel/device: rename gen_get_device_info
Rename the original device info initialization routine so callers
don't mistakenly call the wrong one:

  gen_get_device_info_from_fd:

      Queries kernel for full device info, including topology
      details.

  gen_get_device_info_from_pci_id:

      Partially initializes device info based on PCI ID lookup, when
      the kernel is not available.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:56 -07:00
Mark Janes d594d2a052 intel/tools: use device info initializer
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:54 -07:00
Mark Janes e4a0070db4 anv: use initialization routine for gen_device_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:51 -07:00
Mark Janes 49465f1330 iris/screen: use initialization routine for gen_device_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:48 -07:00
Mark Janes 96e1c945f2 i965: Move device info initialization to common code
With perf queries, initializing the device info is much more complex
than just getting a PCI ID and calling gen_get_device_info.  This commit
adds a new gen_get_device_info_from_fd helper in common code which does
all of the requisite kernel queries to get device info including all of
the topology information.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:44 -07:00
Mark Janes 1186f6ea69 i965/perf: verify kernel support before registering OA metrics
When gen_device_info updates the topology in it's initializer, the
kernel queries will fail silently.  Iris and anv have minimum
kernel requirements that support the queries.  i965 must verify kernel
support before reporting OA metrics.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:41 -07:00
Mark Janes 7852fe5415 intel/common: provide common ioctl routine
i965 links against libdrm for drmIoctl, but anv and iris both
re-implement this routine to avoid the dependency.

intel/dev also needs an ioctl wrapper, so lets share the same
implementation everywhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:38:40 -07:00
Alyssa Rosenzweig b40ba2db6c panfrost: Remove unused argument
A relic from when we didn't have an online compiler, hah.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig ff345d4a01 panfrost: Handle MESA_SHADER_COMPUTE in compile callback
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig 73c40d6bbb pan/midgard: Use standard list traversal to find initial tag
Fixes a hang (and abort) on empty shaders, which you shouldn't have
anyway but better safe than sorry. DCE going on the fritz is no reason
to freeze the system.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig 4647999327 panfrost: Use gl_shader_stage directly for compiles
No need to add a third set of enums to the mix.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig d9eb65c60c panfrost: Emit "draw" info for compute jobs
Important fields relating to shader state and UBOs are filled out from
this (misnomer) function.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig 22a8f6de61 panfrost: Feed compute shaders into the compiler
The path for compute shader compiles resembles the graphic shader
compile path, although it is substantially simpler as we don't need any
shader keying.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig 1b284628ef panfrost: Expose compute shaders as panfrost_shader_variants
Whether variants are packed by graphics or compute is irrelevant.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig 8b53230d47 panfrost: Remove shader state *base
It is now unused.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00