Commit Graph

113901 Commits

Author SHA1 Message Date
Rob Clark 44f3c1cf01 freedreno: update registers
Pull in some updates of VSC regs

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark c179ded9cb freedreno/gmem: small cleanup
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark e2bb3e84ab freedreno/drm: convert ring_pool to child_pool
Worth another couple percent at driver2

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-02 10:24:14 -07:00
Rob Clark 9ac23794c9 freedreno/drm: remove idx_lock
Since it ends up contended, it is a bit of a bottleneck for workloads
with high driver overhead.  Worth nearly +10% at gfxbench driver2.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-02 10:24:14 -07:00
Rob Clark e439f63467 freedreno/batch: always update last_fence
Not all flush paths come thru fd_context_flush(), so we should also set
last_fence in the batch flush path.  This avoids some no-op flushes just
to get a fence.  For example when pctx->flush_resource() triggers a
flush.

We should probably keep the last_fence update in fd_context_flush() as
well to handle deferred flush case.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark c93eae7f10 freedreno: drop unused fd_fence_ref param
The pscreen param was just there to satisfy pipe_screen::fence_reference

But some of the internal uses passed NULL for screen.  Which is a bit
ugly.  Instead drop the param and add a shim function to plug into the
screen.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Alyssa Rosenzweig 1637a53890 pan/midgard: Print invert modifier
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig 62a5ee3bb4 pan/midgard: Flip conditionals
We would like to flip ops to have a constant in the second place to
enable inlining of the constant.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig d066ca3575 pan/midgard: Add bitwise src/invert fusing
De Morgan's Laws and some special ops basically.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig 620c2717cf pan/midgard: Add .not propagation pass
Essentially .pos propagation but for bitwise.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig b821e1b85e pan/midgard: Fuse invert into bitwise ops
We use the new invert flag to produce ops like inand.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Jonathan Marek d8584c5cf2 freedreno: a2xx: implement texture tiling
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek fb5c3db0ab freedreno: a2xx: use nir_lower_alu_to_scalar instead of lowering pass
nir_lower_alu_to_scalar can now be used to only lower certain ops, so we
don't need the custom pass. And we can lower fall_equal/fany_nequal with
lower_vector_cmp instead.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek e652ca4e0b freedreno: a2xx: fix HW binning for batches with >256K vertices
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek 257957b026 freedreno: a2xx: fix fneg/fabs/fsat opcodes
Previously we would get a fmov with modifiers, but now that mov has no type
these opcodes need to be supported.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek 43dbd7d603 freedreno: a2xx: fix order of NIR opts
int_to_float needs to come after bool_to_float, and lower_to_source_mods
needs to come after both, since they don't deal wih source mods.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek 57e980a4fb freedreno: a2xx: fix non-etc1 cubemaps
Not sure how this happened, but apparently all cubemaps need swapped XY.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek 2e029acbe2 freedreno: a2xx: fix fast clear not being used for Z24X8 buffers
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek e25388c97b freedreno: align renderonly scanout buffers
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Eric Engestrom 6125c93e00 gitlab-ci: just build all the tools
This line was mistakenly added while there is already a `-D tools=all`
a few lines below.

Fixes: f60defa72d ("gitlab-ci: Add a shader-db run using v3d on drm-shim.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 16:41:19 +01:00
Sergii Romantsov a86eccfb78 i965/clear: clear_value better precision
Test-case with depth-clear 0.5 and format
MESA_FORMAT_Z24_UNORM_X8_UINT fails due inconsistent
clear-value of 0.4999997.
Maybe its better to improve?

CC: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 0ae9ce0f29 (i965/clear: Quantize the depth clear value based on the format)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111113
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-02 14:25:34 +00:00
Samuel Pitoiset e8110e51c6 radv: fix image_has_{cmask,fmask}() helpers
The driver should now rely on cmask_offset because CMASK can be
disabled by the driver for some reasons (eg. mipmaps). Apply the
same change for FMASK, although it should be useless.

Fixes: ad1bc8621d ("radv: remove radv_get_image_fmask_info()")
Fixes: 10d08da52c ("radv/gfx10: add missing dcc_tile_swizzle tweak")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 14:00:50 +02:00
Samuel Pitoiset ad1bc8621d radv: remove radv_get_image_fmask_info()
It's unnecessary to duplicate fields in another struct.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:46 +02:00
Samuel Pitoiset 10d08da52c radv/gfx10: add missing dcc_tile_swizzle tweak
Fixes: c90f46700d ("radv/gfx10: mask DCC tile swizzle by alignment")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:43 +02:00
Samuel Pitoiset 9c9745e8dd radv: remove radv_get_image_cmask_info()
It's unnecessary to duplicate fields in another struct.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:41 +02:00
Samuel Pitoiset 856487a280 radv: only account for tile_swizzle for color surfaces with DCC
It's 0 for depth surfaces with TC compat HTILE enabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:39 +02:00
Bas Nieuwenhuizen e1c5d8a364 radv: Enable VK_KHR_shader_atomic_int64
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 12:26:32 +02:00
Bas Nieuwenhuizen a17f2206d3 ac/nir: Implement LLVM9 64-bit buffer compare & exchange.
LLVM 9 does not have a 64-bit buffer compswap intrinsic, so this
extracts the ptr, does a bound check and then uses a cmpxchg LLVM
instruction.

Not ideal, but the earliest release we're going to get a proper
intrinsic is LLVM 10.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 12:26:11 +02:00
Connor Abbott 73274c9ec2 Revert "ac/nir: handle negate modifier"
This reverts commit bfea7e4d29.
2019-08-02 11:14:50 +02:00
Connor Abbott 4a382d66ee Revert "ac/nir: handle abs modifier"
This reverts commit d3c80733cd.

These were only appearing due to memory corruption.
2019-08-02 11:14:08 +02:00
Timothy Arceri 06ec14d692 iris: bump compat profile support to 4.6
All of the current piglit compat profile tests pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-02 18:56:53 +10:00
Timothy Arceri 74f96b06d6 egl: fix OpenGL 3.1 context creation
>From the EGL_KHR_create_context spec:

   "* If OpenGL 3.1 is requested, the context returned may implement
       any of the following versions:

         * Version 3.1. The GL_ARB_compatibility extension may or may
           not be implemented, as determined by the implementation.
         * The core profile of version 3.2 or greater."

Fixes CTS tests:

    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_stencil

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-02 18:56:53 +10:00
Connor Abbott f41516bdb5 nir/find_array_copies: Reject copies with mismatched type
When we detect a scalar/vector copy through load_deref/store_deref, we
have to be careful since those can bitcast an int to a float and
vice-versa even though copy_deref can't.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251
Fixes: 156306e5e6 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 10:34:29 +02:00
Samuel Pitoiset 7368000868 radv: re-apply "Optimize rebinding the same descriptor set."
This makes it cheaper to just change the dynamic offsets with
the same descriptor sets.

This optimization has been reverted a while back because of
random GPU hangs on GFX9, no it looks fine, at least CTS no longer
hangs on GFX9 and it doesn't hang on GFX10 as well.

It fixes a performance problem with Wolfenstein Youngblood.

Suggested-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2019-08-02 09:56:55 +02:00
Samuel Pitoiset 96a5445559 radv/gfx10: use the correct target machine for Wave32
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:38 +02:00
Samuel Pitoiset 8a86908e9a radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders
It can be enabled with RADV_PERFTEST=gewave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:36 +02:00
Samuel Pitoiset 953bbacc23 radv/gfx10: add Wave32 support for fragment shaders
It can be enabled with RADV_PERFTEST=pswave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:34 +02:00
Kenneth Graunke 18c2e09dc7 gallium: Implement GL_EXT_shader_samples_identical via a new capability
This exposes the textureSamplesIdenticalEXT function in GLSL.

We enable it for iris and radeonsi, because their compilers already
have support for this.  Tested on Intel Kabylake and AMD Vega 64.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 23:38:54 -07:00
Kenneth Graunke adcc0a8fdc intel/tools: Fix aubinator_viewer build.
This functions was recently renamed and not all callers were updated.

Fixes: 086c486a75 ("intel/device: rename gen_get_device_info")
2019-08-01 23:36:41 -07:00
Francisco Jerez 54fbc625ea intel/ir: Fix CFG corruption in opt_predicated_break().
Specifically the optimization of a conditional BREAK + WHILE sequence
into a conditional WHILE seems pretty broken.  The list of successors
of "earlier_block" (where the conditional BREAK was found) is emptied
and then re-created with the same edges for no apparent reason.  On
top of that the list of predecessors of the block immediately after
the WHILE loop is emptied, but only one of the original edges will be
added back, which means that potentially several blocks that still
have it on their list of successors won't be on its list of
predecessors anymore, causing all sorts of hilarity due to the
inconsistency in the control flow graph.

The solution is to remove the code that's removing valid edges from
the CFG.  cfg_t::remove_block() will already clean up after itself.
The assert in bblock_t::combine_with() also needs to be removed since
we will be merging a block with multiple children into the first one
of them.

Found the issue on a hardware enabling branch originally, but
apparently somebody reproduced the same problem independently on
master in the meantime.

Fixes: d13bcdb3a9 ("i965/fs: Extend predicated break pass to predicate WHILE.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111009
Cc: jiradet.jd@gmail.com
Cc: Sergii Romantsov <sergii.romantsov@globallogic.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Paul Chelombitko <qamonstergl@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-01 16:56:48 -07:00
Mark Janes ddb59cd20e intel/device: make internal functions private
The device info initializer makes several fuctions internal:

  - handling of device override
  - updating topology from kernel information

The implementation file is slightly reordered due to the renamed
functions being static.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:40:03 -07:00
Mark Janes 086c486a75 intel/device: rename gen_get_device_info
Rename the original device info initialization routine so callers
don't mistakenly call the wrong one:

  gen_get_device_info_from_fd:

      Queries kernel for full device info, including topology
      details.

  gen_get_device_info_from_pci_id:

      Partially initializes device info based on PCI ID lookup, when
      the kernel is not available.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:56 -07:00
Mark Janes d594d2a052 intel/tools: use device info initializer
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:54 -07:00
Mark Janes e4a0070db4 anv: use initialization routine for gen_device_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:51 -07:00
Mark Janes 49465f1330 iris/screen: use initialization routine for gen_device_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:48 -07:00
Mark Janes 96e1c945f2 i965: Move device info initialization to common code
With perf queries, initializing the device info is much more complex
than just getting a PCI ID and calling gen_get_device_info.  This commit
adds a new gen_get_device_info_from_fd helper in common code which does
all of the requisite kernel queries to get device info including all of
the topology information.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:44 -07:00
Mark Janes 1186f6ea69 i965/perf: verify kernel support before registering OA metrics
When gen_device_info updates the topology in it's initializer, the
kernel queries will fail silently.  Iris and anv have minimum
kernel requirements that support the queries.  i965 must verify kernel
support before reporting OA metrics.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:41 -07:00
Mark Janes 7852fe5415 intel/common: provide common ioctl routine
i965 links against libdrm for drmIoctl, but anv and iris both
re-implement this routine to avoid the dependency.

intel/dev also needs an ioctl wrapper, so lets share the same
implementation everywhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:38:40 -07:00
Alyssa Rosenzweig b40ba2db6c panfrost: Remove unused argument
A relic from when we didn't have an online compiler, hah.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig ff345d4a01 panfrost: Handle MESA_SHADER_COMPUTE in compile callback
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00