Commit Graph

103701 Commits

Author SHA1 Message Date
Daniel Schürmann 7e7ee82698 ac: add support for 16bit buffer loads
v2: Fixed dvec3 loads (bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Daniel Schürmann a6a21e651d ac: add support for 16bit UBO loads
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Daniel Schürmann 3109c5257b ac: add support for 16bit ssbo stores
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Daniel Schürmann f582367d49 ac: add 16bit conversion operations
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Dave Airlie d73f1026b4 r600: enable tess_input_info for TES
There might be a nicer way to do this, but this is at least correct.

This fixes:
KHR-GL44.tessellation_shader.single.max_patch_vertices
KHR-GL44.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2018-07-23 21:11:35 +01:00
Dave Airlie 760622c328 docs/features: fix virgl gles3.1 entries 2018-07-24 06:10:46 +10:00
Roland Scheidegger 09828feab0 draw: force draw pipeline if there's more than 65535 vertices
The pt emit path can only handle 65535 - the number of vertices is
truncated to a ushort, resulting in a too small buffer allocation, which
will crash.

Forcing the pipeline path looks suboptimal, then again this bug is
probably there ever since GS is supported, so it seems it's not
happening often. (Note that the vertex_id in the vertex header is 16
bit too, however this is only used by the draw pipeline, and it denotes
the emit vertex nr, and that uses vbuf code, which will only emit smaller
chunks, so should be fine I think.)
Other solutions would be to simply allow 32bit counts for vertex
allocation, however 65535 is already larger than this was intended for
(the idea being it should be more cache friendly). Or could try to teach
the pt emit path to split the emit in smaller chunks (only the non-index
path can be affected, since gs output is always linear), but it's a bit
tricky (we don't know the primitive boundaries up-front).

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107295

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-07-23 22:07:07 +02:00
Dave Airlie 51f67eeb21 docs/features: note ARB_copy_image is working on virgl 2018-07-24 06:06:15 +10:00
Dave Airlie 83332618c1 Revert "virgl: remove unused stride-arguments"
This reverts commit dc938b8398.

This adds warnings in vtest, and possibly breaks it.
2018-07-24 06:03:20 +10:00
Dave Airlie 69c2cd0b14 docs/features: note ssbo and atomic counters done for virgl 2018-07-24 05:56:35 +10:00
Dave Airlie 958b57ac82 virgl: add initial shader_storage_buffer_object support. (v2)
This adds the guest side support for ARB_shader_storage_buffer_object.

Co-authors: Gurchetan Singh <gurchetansingh@chromium.org>

v2: move to using separate maximums
(fixup macros)

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-07-24 05:54:21 +10:00
Jason Ekstrand e4d346c86d nir: Add a couple trivial abs optimizations
Spotted in a shader in Batman: Arkham City.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-07-23 10:48:21 -07:00
Caio Marcelo de Oliveira Filho 52d831ff83 glsl: remove delegating constructors to allow build with C++98
Delegating constructors is a C++11 feature, so this was breaking when
compiling with C++98. Change the copy_propagation_state() calls that
used the convenience constructor to use a static member function
instead.

Since copy_propagation_state is expected to be heap allocated, this
change is a good fit.

Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107305
2018-07-23 10:34:43 -07:00
Eric Anholt 6b73a97f84 v3d: Implement a small immediates optimization, based on VC4's.
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).

total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs:     82711 -> 80163 (-3.08%)
2018-07-23 10:21:43 -07:00
Eric Anholt 79e0f042bc v3d: Return an invalid src number if asked for a missing implicit uniform.
Sometimes when iterating over sources, we might want to check if it's the
implicit one.  We wouldn't want to match on a non-implicit src using this
function.
2018-07-23 10:21:43 -07:00
Eric Anholt f2ea936f48 v3d: Skip emitting texture config parameter 2 if it's just the defaults.
shader-db:
total instructions in shared programs: 91275 -> 90768 (-0.56%)
instructions in affected programs:     20702 -> 20195 (-2.45%)
2018-07-23 10:21:43 -07:00
Eric Anholt 421e99d777 v3d: Update an XXX comment for a path we handled in HW on V3D 4.x. 2018-07-23 10:21:43 -07:00
Eric Anholt e7ae900341 v3d: Switch to using the new SFU instructions on V3D 4.x.
These instructions let us write directly to the phys regfile, instead of
just R4.  That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.

There is still an extra instruction of latency, which is not represented
in the scheduler at the moment.  If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value.  That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.

total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs:     82590 -> 78196 (-5.32%)
2018-07-23 10:21:43 -07:00
Eric Anholt 58c1d3860f v3d: Add QPU pack/unpack for the new SFU instructions.
These instructions allow writing the result to any register, instead of a
special writeback to r4.
2018-07-23 10:21:43 -07:00
Eric Anholt cdfa99657d v3d: Fix the name of the "flpop" operation.
Noticed while trying to sort a new op into the appropriate place to match
the documentation.
2018-07-23 10:21:43 -07:00
Eric Anholt 91e24e5718 v3d: Print the instruction we're testing in the QPU disasm/pack round-trip.
If we fail initial disassembly, it's good to know what instruction it was
that failed.
2018-07-23 10:21:42 -07:00
Eric Anholt a1beb333d8 v3d: Drop unused vir_SAT() operation.
We lower saturates in NIR.
2018-07-23 10:21:42 -07:00
Eric Anholt 8dfc6ee317 v3d: Rotate through registers to improve post-RA scheduling options.
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier.  I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.

shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs:     77254 -> 76092 (-1.50%)
2018-07-23 10:21:42 -07:00
Eric Anholt 1fb31819ae v3d: Allow reading from physical regs written in the previous instruction.
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.

shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs:     48520 -> 47234 (-2.65%)
2018-07-23 10:21:23 -07:00
Eric Engestrom e6e22e4207 anv: remove unnecessary runtime copy of static string
It's actually also a bit safer, since now the compiler will warn if
the string is larger than the `.name` array.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-23 17:56:08 +01:00
Alex Smith 54f8f1545f anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT
According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.

v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-23 15:29:43 +01:00
Erik Faye-Lund dc938b8398 virgl: remove unused stride-arguments
The IOCTLs doesn't pass this along, so computing them in the first
place is kinda pointless.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-07-23 11:21:09 +01:00
Samuel Pitoiset 6c58bc8d9c radv: print a big warning when RADV_TRACE_FILE is set
Users shouldn't use this debugging option except when we
ask them to do!

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 11:34:42 +02:00
Samuel Pitoiset 6e32d9e7b0 radv: fix a memleak for merged shaders on GFX9
modules[i] can be NULL for merged shaders but we have to
free the NIR code. radv_can_dump_shader_stats() already handles
if modules[i] is NULL, no need to check it twice.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 11:34:39 +02:00
Jason Ekstrand d0ee0a0a5d intel/blorp: Fix blits to R8G8B8_UNORM_SRGB sRGB harder
The first fix attempt contained a nasty typo which somehow didn't get
caught in review.  It also didn't work as intended because the sRGB
conversion was happening but then throwing away all but the red channel
because it dind't know it was RGB.  Really, it's my fault for trying to
fix a bug without first writing tests.  I've now written tests and they
pass with this change. :)

Fixes: 11712b9ca1 "intel/blorp: Fix blits to R8G8B8_UNORM_SRGB"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-23 00:36:39 -07:00
Jason Ekstrand abd629eb3d anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV
We've had several broadwell hangs that have come down to this bit just
not working correctly.  Most recently, we've had a pile of hangs
reported with apps running under DXVK:

https://github.com/doitsujin/dxvk/issues/469

Instead, use the bit that doesn't try to imply weird D3D coherency
things and just force-enables the PS like we want.

cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-22 23:43:19 -07:00
Jason Ekstrand b99493c628 anv: Properly handle GetImageSubresourceLayout on complex images
We support mipmapped and arrayed linear images so we need to support
vkGetImageSubresourceLayout on them.  Fortunately, it's just a trivial
call into ISL.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-07-22 23:24:10 -07:00
Timothy Arceri 78f391d343 radeonsi/nir: make use of nir_lower_load_const_to_scalar()
This allows NIR to CSE more operations. LLVM does this also so the
impact is limited, however doing this in NIR allows other opts to
make progress. For example some loops in Civilization Beyond Earth
shaders are unrolled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-23 09:48:51 +10:00
Ilia Mirkin 257128079c anv/gen9: expose VK_EXT_post_depth_coverage
Note that the use of ICMS_INNER_CONSERVATIVE disagrees with the GL driver.
Perhaps it's more performant than ICMS_NORMAL and is otherwise permitted?
Not sure, so I left it as-is.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-22 14:56:44 -07:00
Ilia Mirkin 768f143667 spirv: add support for SPV_KHR_post_depth_coverage
Allow the capability to be exposed, and convert the new execution mode
into fs state.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-22 14:56:36 -07:00
Mauro Rossi 6cbbd5b4f8 android: util/disk_cache: fix building errors in gallium drivers
This patch applies the necessary changes in Android.common.mk
as per automake rules, to avoid following building error:

external/mesa/src/gallium/drivers/nouveau/nouveau_screen.c:159:8:
error: implicit declaration of function 'disk_cache_get_function_timestamp'
is invalid in C99 [-Werror,-Wimplicit-function-declaration]
   if (disk_cache_get_function_timestamp(nouveau_disk_cache_create,
       ^
1 error generated.

(v2) -DENABLE_SHADER_CACHE Android cflag is kept, to leave the AS-IS capability enabled

Fixes: cc10b34 ("util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-21 12:06:38 +02:00
Chih-Wei Huang e7ffd3fb08 Android: fix a missing nir_intrinsics.h error
The commit 76dfed8ae2 changed nir_intrinsics.h to be a generated
header, but the corresponding dependency was not updated for Android.
It causes the error:

[  0% 19/4336] target  C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c
...
In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140:
In file included from external/mesa/src/amd/common/ac_llvm_build.h:30:
external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found
         ^~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 76dfed8ae2 ("nir: mako all the intrinsics")
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
2018-07-21 08:50:23 +02:00
Bas Nieuwenhuizen e1febbefe8 nir: Fix end of function without return warning/error.
There always is a continue block, so let us just do unreachable.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 8cacf38f52 "nir: Do not use continue block after removing it."
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107312
2018-07-20 22:27:39 +02:00
Danylo Piliaiev d24c35c3fb st: Sweep NIR after linking phase to free held memory
After optimization passes and many trasfromations most of memory
NIR holds is a garbage which was being freed only after shader deletion.
Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.

The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-07-20 11:26:12 -07:00
Eric Anholt 945524ba0e st/dri: Don't require a dri_format for image creation.
Nothing in EGL_KHR_gl_image.txt seems to let us deny creation based on
formats, and doing so causes many failures in
dEQP-EGL.functional.image.api.*

The NONE value we were protecting from only gets looked at in the
__DRI_IMAGE_ATTRIB_FORMAT and __DRI_IMAGE_ATTRIB_FOURCC queries, which are
used from wayland and gbm (which throw an error cleanly on unknown format)
and DMABUF export.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-20 11:26:12 -07:00
Eric Anholt f6750456c5 egl: Refuse EGL_MESA_image_dma_buf_export if we don't have a DRM fourcc.
The EGL CTS expects that you can make images from all sorts of things,
including things like z16 and s8, which we don't have DRM fourccs for.
Just return an error when trying to export one of those.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-20 11:26:12 -07:00
Eric Anholt a221f9709e v3d: Fix incorrect handling of two fences created back-to-back.
Recreating our context's syncobj with ALREADY_SIGNALED meant that if you
created two fences in a row, then waiting on the second would succeed
immediately.  Instead, export a sync file in the gallium fence (since we
don't have a syncobj clone ioctl), and just create a new syncobj to wait
on whenever we need to.

Noticed while debugging
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
2018-07-20 11:11:29 -07:00
Eric Anholt fc28692a5a v3d: Fix the timeout value passed to drmSyncobjWait().
The API wants an absolute time, so we need to go add gallium's argument to
CLOCK_MONOTONIC.
2018-07-20 11:11:29 -07:00
Eric Anholt 4f04bd68cf v3d: Fix drmSyncobjWait() return value checking even more.
It tends to return >0 in the success case (I think the value is something
like "how much of the timeout remained").  Fixes
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
2018-07-20 11:11:29 -07:00
Eric Anholt 2f90879a34 v3d: Use the list_first_entry/list_last_entry macros. 2018-07-20 11:11:29 -07:00
Eric Anholt d0e53373e5 v3d: Move BO cache counting to dump time instead of cache management.
This is one less way to get the dump stats wrong.
2018-07-20 11:11:29 -07:00
Eric Anholt 7d6aef6fa5 v3d: Reduce the stale BO reclamation spam with dump_stats set.
This was obviously meant to be when we were actually freeing a BO, not
just when there was at least one BO in the list.
2018-07-20 11:11:29 -07:00
Eric Anholt 5d11094db1 v3d: Respect a sampler view's first_layer field.
Fixes texturing from EGL images created from cubemap faces, as in
dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture
2018-07-20 11:11:29 -07:00
Sonny Jiang c6737756ad radeonsi: emit_spi_map packets optimization
v2: marek: remove an empty line before break;
    rename reg_val_seq -> spi_ps_input_cntl
    "type * x" -> "type *x"

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-07-20 13:50:26 -04:00
Gert Wollny 4d094993c3 virgl: Expose GL_ARB_copy_image if host supports it
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-07-20 19:15:12 +02:00