Commit Graph

112599 Commits

Author SHA1 Message Date
Marek Olšák 13a5e9d685 gallium/util: rewrite depth-stencil blit shaders
- merge all 3 functions (Z, S, ZS)
- don't write the color output
- read the value from texel.x, then write it to position.z or stencil.y
  (don't use the value from texel.y or texel.z)

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-07-03 15:51:12 -04:00
Marek Olšák 131d40cfc9 st/mesa: accelerate glCopyPixels(STENCIL)
Tested-by: Dieter Nützel
2019-07-03 15:50:04 -04:00
Yevhenii Kolesnikov 65dc4db08e glsl/standalone: meson test for --dump-builder
Added meson test for standalone compiler with --dump-builder option
on builtin texture* functions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107767
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-07-03 12:13:37 -07:00
Sergii Romantsov 9f85b4940c glsl/standalone: exit on unsupported texture functions
glsl/standalone with --dump-builder will exit when unsupported texture
functions are encountered.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107767
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-07-03 12:13:37 -07:00
Pierre-Eric Pelloux-Prayer ea5b7de138 radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled
gl_SampleMaskIn is 1 when R_028BE0_PA_SC_AA_CONFIG is 0, so this commit rework the conditions
controlling this register.

Before it was set if the sctx->framebuffer had a sample count > 1.

Now we still require this condition, but we also need either:
  - GL_MULTISAMPLE to be enabled
  - to be executing an operation that doesn't depends on GL state using u_blitter.

This fixes the arb_sample_shading/sample_mask piglit tests on radeonsi.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-03 14:59:21 -04:00
Brian Paul 7bb3d6acec gallium/u_blitter: enable MSAA when blitting to MSAA surfaces
If we're doing a Z -> Z MSAA blit (for example) we need to enable
msaa rasterization when drawing the quads so that we can properly
write the per-sample values.

This fixes a number of Piglit ext_framebuffer_multisample blit tests
such as ext_framebuffer_multisample/no-color 2 depth combined with
the VMware driver.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-03 14:59:15 -04:00
Alexandros Frantzis e5be4351c2 virgl: Clear the valid buffer range when possible
If we are discarding the whole resource, we don't care about previous contents,
and the resource storage is now unused, either because we have created new
resource storage, or because we have waited for the existing resource storage
to become unused, or because the transfer is unsynchronized.

In the last two cases this commit marks the storage as uninitialized, but only
if the resource is not host writable (in which case we can't clear the valid
range, since that would result in missed readbacks in future transfers).

In the first case, when the whole resource discard involves a reallocation, the
reallocation and subsequent rebinding already update the valid buffer range
appropriately.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-07-03 09:59:55 -07:00
Jan Zielinski 243db4980c swr/swr: Enable ARB_viewport_array
The rasterizer core supported ARB_viewport_array,
but the swr layer connecting core to Gallium state
tracker only allowed one viewport.

We add support for multiple viewports to swr layer.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-03 14:43:28 +02:00
Bas Nieuwenhuizen c6cb9b197d radv: Support VK_EXT_queue_family_foreign.
Basically same as external for now.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Only case we might need to handle differently in the near future
is Raven's case of displayable DCC which is not renderable. But
we don't support that yet.
2019-07-03 10:56:21 +00:00
Bas Nieuwenhuizen 8a053254b8 radv: Fix interactions between variable descriptor count and inline uniform blocks.
Fixes: d7e6541cc7 "radv: Only allocate supplied number of descriptors when variable."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-03 10:43:35 +00:00
Michel Dänzer 11a3679e3a winsys/amdgpu: Make KMS handles valid for original DRM file descriptor
Getting a DMA-buf fd and converting that to a handle using our duplicate
of that file descriptor (getting at which requires passing a
radeon_winsys pointer to the buffer_get_handle hook) makes sure of this,
since duplicated file descriptors reference the same file description
and therefore the same GEM handle namespace.

This is necessary because libdrm_amdgpu may use a different DRM file
descriptor with a separate handle namespace internally, e.g. because it
always reuses any existing amdgpu_device_handle for the same device.
amdgpu_bo_export returns a handle which is valid for that internal
file descriptor.

Bugzilla: https://bugs.freedesktop.org/110903
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-03 09:19:07 +00:00
Michel Dänzer cb446dc0fa winsys/amdgpu: Add amdgpu_screen_winsys
It extends pipe_screen / radeon_winsys and references amdgpu_winsys.
Multiple amdgpu_screen_winsys instances may reference the same
amdgpu_winsys instance, which corresponds to an amdgpu_device_handle.

The purpose of amdgpu_screen_winsys is to keep a duplicate of the DRM
file descriptor passed to amdgpu_winsys_create, which will be needed
in the next change.

v2:
* Add comment in amdgpu_winsys_unref explaining why it always returns
  true (Marek Olšák)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-03 09:19:07 +00:00
Michel Dänzer 6fce296400 winsys/amdgpu: Use amdgpu_winsys helper instead of open-coded casts
Cleanup to prevent breakage with the next change, no functional change
intended in this one.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-03 09:19:07 +00:00
Juan A. Suarez Romero e06bc0b166 intel: fix wrong format usage
Do not use the view format when filling the surface state.

Fixes dEQP-VK.image.texel_view_compatible.compute.extended.texture.*

Fixes: fb1350c76f ("intel: Add and use helpers for level0 extent")

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-03 10:14:54 +02:00
Samuel Pitoiset a7b6a869a7 radv: only allocate a 32-bit value for the TC-compat range metadata
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 08:52:01 +02:00
Samuel Pitoiset 6baa453dd5 radv: remove unused code in radv_update_tc_compat_zrange_metadata()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 08:51:58 +02:00
Samuel Pitoiset a21f23c811 radv: add radv_get_depth_pipeline() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 08:51:42 +02:00
Mike Blumenkrantz e005470466 iris: assert isl_surf_init success in resource_from_handle
this can fail unexpectedly due to bugs, so it's good to provide feedback
when this occurs

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-07-02 15:39:44 -07:00
Jason Ekstrand e708261cb7 anv: Advertise a more accurate minTexelBufferOffsetAlignment
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-02 22:28:44 +00:00
Jason Ekstrand 0bc657f2db anv: Implement VK_EXT_texel_buffer_alignment
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-02 22:28:44 +00:00
Jason Ekstrand 465ec0b145 vulkan: Update the XML and headers to 1.1.113
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-02 22:28:44 +00:00
Caio Marcelo de Oliveira Filho 050eb6389a spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup
From OpPtrAccessChain description in the SPIR-V spec (1.4 rev 1):

    For objects in the Uniform, StorageBuffer, or PushConstant storage
    classes, the element’s address or location is calculated using a
    stride, which will be the Base-type’s Array Stride when the Base
    type is decorated with ArrayStride. For all other objects, the
    implementation will calculate the element’s address or location.

For non-CL shaders the driver should layout the Workgroup storage
class, so override any explicitly set ArrayStride in the shader.  This
currently fixes only the lower_workgroup_access_to_offsets case, which
is used by anv.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
2019-07-02 12:15:01 -07:00
Karol Herbst 95a7fd0f10 nouveau: handle new CAPS
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-07-02 20:09:44 +02:00
Jason Ekstrand fa869f45c8 intel/fs: Use nir_lower_interpolation on gen11+
On gen11, the removed the PLN instruction so we have to emit a pile of
MAD to emulate it.  We may as well do that in NIR so we can optimize and
later schedule it.

Shader-db results on Ice Lake:

    total instructions in shared programs: 17145644 -> 16556440 (-3.44%)
    instructions in affected programs: 11507454 -> 10918250 (-5.12%)
    helped: 35763
    HURT: 42085
    helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18
    helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49%
    HURT stats (abs)   min: 1 max: 248 x̄: 2.22 x̃: 2
    HURT stats (rel)   min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47%
    95% mean confidence interval for instructions value: -7.67 -7.47
    95% mean confidence interval for instructions %-change: -4.46% -4.29%
    Instructions are helped.

    total loops in shared programs: 4370 -> 4370 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 360624645 -> 368220857 (2.11%)
    cycles in affected programs: 269631244 -> 277227456 (2.82%)
    helped: 15583
    HURT: 65874
    helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32
    helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44%
    HURT stats (abs)   min: 1 max: 238638 x̄: 133.87 x̃: 20
    HURT stats (rel)   min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97%
    95% mean confidence interval for cycles value: 67.42 119.09
    95% mean confidence interval for cycles %-change: 3.61% 3.73%
    Cycles are HURT.

    total spills in shared programs: 8943 -> 8981 (0.42%)
    spills in affected programs: 1925 -> 1963 (1.97%)
    helped: 44
    HURT: 14

    total fills in shared programs: 21815 -> 21925 (0.50%)
    fills in affected programs: 3511 -> 3621 (3.13%)
    helped: 41
    HURT: 18

    LOST:   70
    GAINED: 14

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand 2b79a9e5a5 intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand 8e7d066682 intel/fs: Actually implement the load_barycentric intrinsics
If they never get used, dead code should clean them up.  Also, we rework
the at_offset and at_sample intrinsics so they return a proper vec2
instead of returning things in PLN layout.  Fortunately, copy-prop is
pretty good at cleaning this up and it doesn't result in any actual
extra MOVs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Rob Clark 5787a2dfe3 nir: add pass to lower load_interpolated_input
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Boris Brezillon 5375d009be panfrost: Pass referenced BOs to the SUBMIT ioctls
Instead of manually adding the BOs from the various SLAB pools plus
the one backing the color FB, we insert them in the BO set attached
to the job and let panfrost_drm_submit_job() pass all BOs from this set
to the SUBMIT ioctl.
This means we are now passing all referenced BOs and let the scheduler
wait on referenced BO fences if needed.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 15:00:21 +02:00
Boris Brezillon 3557746e0d panfrost: Make SLAB pool creation rely on BO helpers
There's no point duplicating the code, and it will help us simplify
the bo_handles[] filling logic in panfrost_drm_submit_job().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:59:28 +02:00
Boris Brezillon c684a79669 panfrost: Add the panfrost_drm_{create,release}_bo() helpers
To avoid the panfrost_memory <-> panfrost_bo dance done in
panfrost_resource_create_bo() and panfrost_bo_unreference().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon 948fddfc42 panfrost: Move the mmap BO logic out of panfrost_drm_import_bo()
So we can re-use it for the panfrost_drm_create_bo() function we are
about to introduce.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon 8d4afcdacc panfrost: Avoid passing winsys handles to import/export BO funcs
Let's keep a clear split between ioctl wrappers and the rest of the
driver. All the import BO function need is a dmabuf FD and the screen
object, and the export one should only take care of generating a dmabuf
FD out of a BO object. Winsys handle manipulation should stay in the
resource.c file.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon aa5bc35f31 panfrost: Move BO meta-data out of panfrost_bo
That's what most (all?) implementation seem to do, and my understanding
is that a BO is just a bunch of memory that can be used for anything GPU
related, not only texture/FB resources.

Let's move those meta data in panfrost_resource so we can use
panfrost_bo for all kind of memory allocation and make BO allocation
more consistent.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon c4f4193ad4 panfrost: Stop exposing internal panfrost_drm_*() functions
panfrost_drm_submit_job() and panfrost_fence_create() are not used
outside of pan_drm.c.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon 6ba61324f0 panfrost: Get rid of the "free imported BO" logic
bo->imported was never set to true which means this path was never taken.
Moreover, panfrost_drm_free_imported_bo() is doing missing the munmap()
call which seems wrong because the import BO function calls mmap().

Let's just kill this function along with the ->imported field.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon 079aaa9c6d panfrost: Get rid of the panfrost_driver abstraction leftovers
Commit 5f81669d88 ("panfrost: Remove the panfrost_driver abstraction")
left a few things behind, remove them now.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon 6608642d21 panfrost: Move scanout res creation out of panfrost_resource_create()
Which improves readability and help us avoid a memory leak.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon 873b7b93e8 panfrost: Add the sampled texture BO to the job
Otherwise we get random use-after-{free,unmap} errors.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
Changes in v2:
- Move the panfrost_job_add_bo() call out of the loop
2019-07-02 14:57:35 +02:00
Samuel Pitoiset 6cc213b3c1 radv: enable DCC for layers on GFX8
It's currently only enabled if dcc_slice_size is equal to
dcc_slice_fast_clear_size because the driver assumes that
portions of multiple layers are contiguous but it's not
always true.

Still not supported on GFX9.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:38:02 +02:00
Samuel Pitoiset 233224c7f7 radv: do not enable DCC for mipmapped arrays because performance is worse
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:38:00 +02:00
Samuel Pitoiset e41e575e24 radv: implement clearing DCC layers on GFX8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:56 +02:00
Samuel Pitoiset e47c68b7b0 radv: merge radv_dcc_clear_level() into radv_clear_dcc()
This will help for clearing DCC arrays because we need to know
the subresource range.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:51 +02:00
Samuel Pitoiset f772fe6a11 radv: add support for decompressing DCC layers with compute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:49 +02:00
Samuel Pitoiset 83297baf2d ac: compute the DCC fast clear size per slice on GFX8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:44 +02:00
Samuel Pitoiset 6517d226ac ac: compute the size of one DCC slice on GFX8
Addrlib doesn't provide this info. Because DCC is linear, at least
on GFX8, it's easy to compute the size of one slice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:41 +02:00
Kenneth Graunke 457a55716e iris: Defer closing and freeing VMA until buffers are idle.
There will unfortunately be circumstances where we cannot re-use a
virtual memory address until it's no longer active on the GPU.  To
facilitate this, we instead move BOs to a "dead" list, and defer
closing them and returning their VMA until they are idle.  We
periodically sweep these away in cleanup_bo_cache, which triggers
every time a new object's refcount hits zero.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-02 07:23:55 +00:00
Kenneth Graunke 07f3455664 iris: Add an explicit alignment parameter to iris_bo_alloc_tiled().
In the future, some images will need to be aligned to a larger value
than 4096.  Most buffers, however, don't have any such requirement,
so for now we only add the parameter to iris_bo_alloc_tiled() and
leave the others with the simpler interface.

v2: Fix missing alignment in vma_alloc, caught by Caio!

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-02 07:23:55 +00:00
Iago Toral Quiroga 042aeffd5b v3d: do not flush jobs that are synced with 'Wait for transform feedback'
Generally, we achieve this by skipping the flush on calls to
v3d_flush_jobs_writing_resource() when we detect that the resource is written
in the current job from a transform feedback write.

The exception to this is the case where the caller is about to map the
resource, in which case we need to flush immediately since we can only emit
'Wait for transform feedback' commands on rendering jobs. We add a parameter
to the function so the caller can identify that scenario.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00
Iago Toral Quiroga 88cbc4f7f6 v3d: emit 'Wait for transform feedback' commands when needed
The hardware can flush transform feedback writes before reads in the same
job by inserting this command.

This patch detects when the rendering state for the current draw call reads
resources that had been previously written by transform feedback in the
same job and inserts the 'Wait for transform feedback' command before
emitting the new draw.

v2 (Eric):
  - this was intended to look at job->tf_write_prscs for TF jobs.
  - clear job->tf_write_prscs after we emit the TF flush.
  - can skip flushes for fragment shader reads from TF.

v3 (Eric):
  - all resources in job->tf_write_prscs are resources written by TF so
   we don't need to check if they are bound to PIPE_BIND_STREAM_OUTPUT.
  - documented optimization opportunity for geometry stages.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00
Iago Toral Quiroga c7dff0e614 v3d: keep track of resources written by transform feedback
The hardware provides a feature to sync reads from previous transform feedback
writes in the same job so if we use this mechanism we no longer have to flush
the job.

In order to identify this scenario we need a mechanism to identify resources
that are written by transform feedback.

v2: use _mesa_pointer_set_create (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00