Commit Graph

111651 Commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho b7c9fc72fd glsl: Make interlock builtins follow same compiler rules as barriers
Generalize the barrier code to provide correct error messages for
other builtins.

Fixes most of piglit compilation tests for
ARB_fragment_shader_interlock.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2019-06-10 14:29:26 -07:00
Eduardo Lima Mitev fb2169040a nir/opt_algebraic: Fix rules for imadsh_mix16
The rules added in patch 3addd7c are inverted:

It should be:

(al * bh) << 16 + c

instead of:

(ah * bl) << 16 + c

Fixes a number of regressions under
dEQP-GLES31.functional.draw_indirect.compute_interop.large.*
on Freedreno.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-10 22:27:46 +02:00
Alyssa Rosenzweig e9703fb416 panfrost: Ignore discards in dead branch analysis
Fixes regressions in
dEQP-GLES2.functional.shaders.discard.dynamic_loop_*

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 08:23:08 -07:00
Samuel Pitoiset e9316fdfd4 radv: fix setting CB_SHADER_MASK for dual source blending
CB_SHADER_MASK was computed without the second color buffer
format which looks totally wrong to me.

While we are at it, copy a comment from RadeonSI.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-10 17:21:56 +02:00
Alyssa Rosenzweig 50ffaaff3b panfrost/midgard: Disambiguate register mode
We postfix instructions by their size if a destination override is in
place (a la AT&T assembly), disambiguating instruction sizes.
Previously, "16-bit instruction, 16-bit dest, 16-bit sources"
disassembled identically to "32-bit instruction, 16-bit dest, 16-bit
sources", which is semantically distinct due to the lessened opportunity
for parallelism but (potentially) greater precision. Adding a postfix
removes the ambiguity and relieves mental gymnastics reading weird
disassemblies even in some cases that are not ambiguous.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:12 -07:00
Alyssa Rosenzweig 8027cc9975 panfrost/midgard: Expose vec8/vec16 modes
Midgard ALUs can operate in one of four modes: vec2 64-bit, vec4 32-bit,
vec8 16-bit, or vec16 8-bit. Our compiler (and indeed, any OpenGL ES
shader) only uses 32-bit (and eventually vec4 16-bit) modes in normal
circumstances. Nevertheless, the other modes do exist and are easily
accessible through OpenCL; they also come up in cases like blend
shaders.

While we have had minimal support for decoding 8-bit/64-bit modes, we
did so pretending they were vec4 in each case; 16-bit registers had a
synthetically duplicated register file to separate lo/hi halves, etc.
This works for GL, but it doesn't map to what the hardware is -actually-
doing, which can cause some headscratchingly bizarre disassemblies from
OpenCL. So, we dive in the deep end and support these other modes
natively in the disassembler, using absurdly long masks/swizzles, since
the hardware is considerably more flexible than what was exposed before.

Outside of some fixed routines for blending, none of the above is
supported in the compiler yet. But it's better to have it in the ISA
definitions and disassembler than not, for future use if nothing else.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig 2d0bda0885 panfrost/midgard: Add shifting int modifiers
As a source modifier, shift allows shifting a value left by the bit
size, useful in conjunction with a greater register mode, for instance
to implement `upsample`. As a concrete example, the following OpenCL:

   ushort hr0 = /* ... */, uint r1 = /* ... */;
   uint r2 = (convert_uint(hr0) << 16) ^ b;

compiles to the following Midgard assembly:

   ixor r, (hr0) << 16, b

In reverse, the ".hi" output modifier shifts the value right by the bit
size, leaving just the carry/overflow at the bottom. To implement *_hi
functions in OpenCL (for <64-bit), we do arithmetic in the 2x higher
mode with the .hi modifier. (For 64-bit, things are hairier, since there
is not an 128-bit int mode).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig 6780481a3f panfrost/midgard: Add integer outmods
For floats, output modifiers determine clamping behaviour. For integers,
they determine wrapping/saturation behaviour (or shifting -- see next
commit). These are very different; they are conceptually two unrelated
enums union'ed together; the distinction is responsible for many-a-bug.
While clamping behaviour for floats was clear from GL, the int behaviour
is only known From OpenCL contortion with convert_*_sat() functions.

With the underlying functions known, clean up the codebase, likely
fixing outmod type related bugs in the process.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig 215b8844ee panfrost/midgard: Note floating compares type convert
OP_TYPE_CONVERTS denotes an opcode that returns a different type than is
source (going from int-domain to float-domain or vice versa), named
after the f2i/i2f family of opcodes it covers. We care because source
mods are determined by the source type (i/f) but output modifiers are
determined by the output type (equals the source type, unless the op
type converts, in which case it's the opposite).

The upshot is that floating-point compares (feq/fne/etc) actually do
type-convert.  That is, that take in floating-points and output in
integer space (a boolean), so we mark them off this way to ensure the
correct output modifiers are used.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig d48d991ce2 panfrost: Align linear renderable resources
It's just -easier- to render to aligned framebuffers. For winsys
targets, we already align, but even for an internal linear FBO we ought
to align everything nicely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:48:07 -07:00
Alyssa Rosenzweig d89e0716a1 panfrost: Fix stride check when mipmapping
Now that we support custom strides on mipmapped textures (theoretically,
at least), extend the stride check to support mipmaps.  Fixes incorrect
strides of linear windows in Weston.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-10 06:47:18 -07:00
Alyssa Rosenzweig 416fc3b5ef panfrost: Refactor texture/sampler upload
We move some coding packing the texture/sampler descriptors into
dedicated functions (out of the terrifyingly long emit_for_draw
monolith), cleaning them up as we go.

The discovery triggering the cleanup is the format for including manual
strides in the presence of mipmaps/cubemaps. Rather than placed at the
end like previously assumed, they are interleaved after each address.
This difference is relevant when handling NPOT linear mipmaps.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:45:33 -07:00
Alyssa Rosenzweig a35069a7b5 panfrost: Refactor blitting code
We refactor the wallpaper rendering code to separate the
wallpaper-specific bits from the general blitting capabilities. In the
(hopefully near) future, we'll turn this on to implement real Gallium
blits, e.g. for automatic mipmap generation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:45:25 -07:00
Alyssa Rosenzweig d878753efa panfrost: Refactor AFBC code
This patch does a substantial cleanup of the code for handling AFBC,
moving various disparate misplaced functions into a new central
pan_afbc.c file.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:45:14 -07:00
Alyssa Rosenzweig b4763984ac panfrost: Move pan_screen() to pan_screen.h
Trivial.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:45:05 -07:00
Alyssa Rosenzweig a38583e352 panfrost: Always align strides to cache line (64)
(Performance tweak.)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:44:56 -07:00
Emil Velikov 0534fcf57d docs: fixup 19.0.5 <> 19.0.6 confusion
The title of the release notes says 19.0.5 while the rest of the file
(correctly) says 19.0.6

Fixes: fe79d75ccf ("docs: Add relnotes for 19.0.6")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan at pnwbakers.com>
2019-06-10 14:04:39 +01:00
Emil Velikov a379b1c0ee mapi: correctly handle the full offset table
Earlier commit converted ES1 and ES2 to a new, much simpler, dispatch
generator. At the same time, GL/glapi and the driver side are still
using the old code.

There is a hidden ABI between GL*.so and glapi.so, former referencing
entry-points by offset in the _glapi_table. Hence earlier commit added
the full table of entry-points, alongside a marker for other cases like
indirect GL(X) and driver-size remapping.

Yet the patches did not handle things fully, thus it was possible to
get different interpretations of the dispatch table after the marker.

This commit fixes that adding an indicative error message to catch
future bugs.

While here correct the marker (MAX_OFFSETS) comment.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302
Fixes: cf317bf093 ("mapi: add all _glapi_table entrypoints tostatic_data.py")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-06-10 14:04:30 +01:00
Emil Velikov 497de977bd mapi: add static_date offset to EXT_dsa
As elaborated in the next patch, there is some hidden ABI that
effectively require most entrypoints to be listed in the file.

Cc: Marek Olšák <marek.olsak@amd.com>
Fixes: d2906293c4 ("mesa: EXT_dsa add selectorless matrix stackfunctions")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-06-10 14:04:25 +01:00
Emil Velikov 61960547df mapi: add static_date offset to MaxShaderCompilerThreadsKHR
As elaborated in the next patch, there is some hidden ABI that
effectively require most entrypoints to be listed in the file.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302
Cc: Marek Olšák <maraeo@gmail.com>
Fixes: c5c38e831e ("mesa: implement ARB/KHR_parallel_shader_compile")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-06-10 14:04:18 +01:00
Mathias Fröhlich a7ecf78b90 egl: Let the caller of dri2_create_drawable decide about loaderPrivate.
In the call arguments to dri2_create_drawable decouple loaderPrivate
from dri2_surf. For all callers of dri2_create_drawable the two
pointers are the same with the exception of the gbm backed platform.
Let the calling code of dri2_create_drawable decide what
loaderPrivate shall be.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-06-10 11:06:48 +02:00
Samuel Pitoiset 91aa25f462 radv: fix alpha-to-coverage when there is unused color attachments
When alphaToCoverage is enabled, we should always write the alpha
channel of MRT0 if it's unused. This now matches RadeonSI.

This fixes the new CTS:
dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
2019-06-10 09:23:41 +02:00
Tomeu Vizoso 2fe7f9f2ae panfrost: ci: Switch from direct Docker use to buildah
Use the infrastructure in wayland/ci-templates to build the container
images.

This prevents from getting into some situations in which the images
wouldn't be rebuilt, and allows us to share some infrastructure with
other projects in freedesktop.org.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 08:09:23 +02:00
Kenneth Graunke 81582e9366 gallium/u_transfer_helper: Free the staging buffer on unmap.
u_transfer_helper sometimes mallocs a staging buffer, and leaked it.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-09 15:16:10 -07:00
Lionel Landwerlin 17898a9b7e intel/gpu_dump: fix argument passing
We were dropping "/' around arguments grouped together.
This was triggering failures with :

   $ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-06-09 19:45:13 +00:00
Eric Engestrom 93349d7118 util/os_file: suppress sign comparison warning
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-09 13:14:13 +00:00
Eric Engestrom fd5c18de88 util/os_file: fix error being sign-cast back and forth
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-09 13:14:13 +00:00
Eric Engestrom 341ba406fd util/os_file: avoid shadowing read() with a local variable
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-09 13:14:13 +00:00
Eric Engestrom 7e35f20d44 util/os_file: actually return the error read() gave us
Fixes: 316964709e "util: add os_read_file() helper"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-09 13:14:13 +00:00
Alexandros Frantzis f8f222ea36 virgl: Work around possible memory exhaustion
Since we don't normally flush before performing copy transfers, it's
possible in some scenarios to use too much memory for staging resources
and start failing. This can happen either because we exhaust the total
available memory (including system memory virtio-gpu swaps out to), or,
more commonly, because the total size of resources in a command buffer
doesn't fit in virtio-gpu video memory.

To reduce the chances of this happening, force a flush before a copy
transfer if the total size of queued staging resources exceeds a certain
limit. Since after a flush any queued staging resources will be
eventually released, this ensures both that each command buffer doesn't
require too much video memory, and that we don't end up consuming too
much memory for staging resources in total.

Fixes kernel errors reported when running texture_upload tests in glbench.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:45 -07:00
Alexandros Frantzis e34f79c918 virgl: Remove incorrect resource wait condition
Now that we have copy transfers in place, we can remove the incorrect
resource wait condition. Copy transfers and other optimizations minimize
the performance impact of this removal, while providing the correct
behavior.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:43 -07:00
Alexandros Frantzis 236c55f650 virgl: Use copy transfers for textures
Extend copy transfers to also be used for busy textures.

Performance results:
Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:42 -07:00
Alexandros Frantzis a22c5df079 virgl: Use buffer copy transfers to avoid waiting when mapping
We typically need to wait for a buffer to become ready before mapping,
so that we don't write new contents while the host is still using the
old contents. However, if we are allowed to discard the contents of the
mapped buffer range, then we can avoid waiting by using a staging buffer
range which we guarantee to never be busy, copying from the staging
buffer range to the target buffer in the host.

This commit implements this optimization by utilizing a dedicated
u_upload_mgr for the staging buffer.

Performance results:
Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS
glmark2 ubo, qemu before: 38 FPS after: 331 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:39 -07:00
Alexandros Frantzis 6e7726e50c virgl: Support copy transfers
Support transfers that use a different resource as the source of data to
transfer. This will be used in upcoming commits to send data to host
buffers through a transfer upload buffer, in order to avoid waiting
when the buffer resource is busy.

Note that we don't support queueing copy transfers in the transfer
queue. Copy transfers should be emitted directly in the command queue,
allowing us to avoid flushes before them and leads to better
performance.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:36 -07:00
Alexandros Frantzis 199d95f29e virgl: Add copy_transfer3d definitions
Introduce definitions for the copy_transfer3d protocol command and virgl
capability. This command transfers data to the host by copying through
another resource, and will be used in upcoming commits to avoid waiting
when transferring data for busy resources.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:34 -07:00
Alexandros Frantzis ccec1555c1 virgl: Make VIRGL_BIND_STAGING resources cacheable
This could help performance when trying to recreate such resources for
copy transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:33 -07:00
Alexandros Frantzis 636345f496 virgl: Support VIRGL_BIND_STAGING
Support a new virgl bind type for staging buffers which don't require
dedicated host-side storage. These will be used to implement copy
transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:31 -07:00
Alexandros Frantzis f38cdaebac virgl: Avoid unfinished transfer_get with PIPE_TRANSFER_DONTBLOCK
If we are not allowed to block, and we know that we will have to wait,
either because the resource is busy, or because it will become busy due
to a readback, return early to avoid performing an incomplete
transfer_get. Such an incomplete transfer_get may finish at any time,
during which another unsynchronized map could write to the resource
contents, leaving the contents in an undefined state.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:22 -07:00
Alexandros Frantzis 8eb8222c10 virgl: Deduplicate checks for resource caching
Also fixes a missed check for VIRGL_BIND_CUSTOM in one of the duplicate
code snippets.

Note that legacy fences also use VIRGL_BIND_CUSTOM, but we ensured they
don't go through the cache in the previous commit.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:20 -07:00
Alexandros Frantzis e0ffcdf16a virgl: Don't try to use cached resources for legacy fences
Resources for fences should not be from the cache, since we are basing
the fence status on the resource creation busy status.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:16 -07:00
Alexandros Frantzis 8089d3658a virgl: More info about chosen alignment value
Add more info about why the value of VIRGL_MAP_BUFFER_ALIGNMENT.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:44:53 -07:00
Chia-I Wu 371743157e virgl: store all info about atomic buffers
We will need the full info.  This also speeds up
virgl_attach_res_atomic_buffers and fixes resource leaks when the
context is destroyed.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu 98fd742d7e virgl: add shader images to virgl_shader_binding_state
It replaces virgl_context::images.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu f965efb3c8 virgl: add SSBOs to virgl_shader_binding_state
It replaces virgl_context::ssbos.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu 920c4143f0 virgl: add UBOs to virgl_shader_binding_state
It replaces virgl_context::ubos.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu 2e21d66d7a virgl: add virgl_shader_binding_state
virgl_shader_binding_state will be used to manage all per-stage
shader bindings.  For now, it manages only sampler views.

This replaces virgl_textures_info and fixes some issues

 - start_slot is now honored
 - views outside of [start_slot, slart_slot+count) are unmodified
 - views are released when the context is destroyed

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Kenneth Graunke 30314270d4 iris: Zero shs->cbuf0 when binding a passthrough TCS
Fixes valgrind errors when running two CTS tests back to back:
- KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT*
(The first test has an actual TCS, the second uses passthrough.)
2019-06-07 15:13:42 -07:00
Jason Ekstrand 1e6b32d08c intel/blorp: Only double the fast-clear rect alignment on HSW
This restriction was accidentally added to the BSpec/PRM as an
unrestricted restriction starting with the HSW docs and it was never
removed.  However, it only ever applied to HSW and actually potentially
causes problems on BDW and above where we have mipmapped fast-clears.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-06-07 22:00:55 +00:00
Rob Clark 3c456cf583 freedreno/a6xx: re-arrange program stageobj/group
Split out a separate program config state group to run early before the
other groups.

This seems to help w/ intermittent "missed tiles" (although I had
assumed that was a mem2gmem issue), or at least I can't reproduce that
issue with this patch, but can without.

It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark 958f6ffb60 freedreno/a6xx: fix hangs with newer sqe fw
With the newer (v1.76) fw, we were getting hangs (compared to older
v1.66 fw).  Re-work the GMEM code to structure things a bit closer to
the blob.  This moves some PKT7 packets from IB2 to IB1, which I think
is what was confusing SQE and causing it to get stuck in an infinite
loop.  But in general structuring things at least closer to the same way
blob does makes it easier to compare cmdstream.

Note: this is a bit on the large side for what I'd normally consider for
stable.. but right now it is looking  like it is the newer fw that is
headed for linux-firmware.  This should defn have some soak time on
master, but probably a good idea for this patch to end up in distro mesa
builds by the time a630_sqe.fw hits linux-firmware.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00