Commit Graph

131291 Commits

Author SHA1 Message Date
Chad Versace 2b3ec91326 anv/image: Rename get_wsi_format_modifier_properties_list()
Rename it to get_drm_format_modifier_properties_list() because it is now
independent of WSI.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2020-11-17 10:36:45 -08:00
Chad Versace 57d416d423 anv/image: Fix VkExternalMemoryProperties for images (v5)
In vkGetPhysicalDeviceImageFormatProperties2, we advertised support for
VK_IMAGE_TILING_LINEAR and VK_IMAGE_TILING_OPTIMAL for all memory
handles.

However, when importing or exporting an image, there must exist a method
that enables the app and driver to agree on the image's memory layout.
If no method exists, then we should reject image creation.

v2:
  - Reduce copy-paste for Lionel.
v3:
  - Treat tiling LINEAR and DRM_FORMAT_MODIFIER as identical when
    determing compatible memory handles.
  - Improve comments.
v4:
  - Remove DMA_BUF from opaque_fd_only_props.
v5:
  - Minor changes to code style for `if`. (for jekstrand)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v4)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v4)
2020-11-17 10:36:45 -08:00
Chad Versace a2aa56905c anv/image: Delete the list of modifier-compatible formats
The code asserted that we supported no more than 4 formats with
modifiers: /VK_FORMAT_B8G8R8(A8)?_(SRGB|UNORM)/.
Strangely, 2 of the 4 were non-power-of-two formats, which were rejected
elsewhere.

The assertion's comment suggested that we use a hard-coded list of
formats because the driver was not yet able to determine if a given
format was compatible with a given modifier.  Therefore, the list only
contained formats that were compatible with *all* modifiers. That code
deficiency no longer exists: anv_get_image_format_features() can check
format/modifier compatibility.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2020-11-17 10:36:45 -08:00
Chad Versace 146f41e608 anv/image: Refactor iteration over modifiers
Refactor in get_wsi_format_modifier_properties_list().

Instead of iterating over a function-local hard-coded list, iterate over
all modifiers in isl_drm.c.

This will improve agreement in behavior between
VkDrmFormatModifierPropertiesListEXT
VkPhysicalDeviceImageDrmFormatModifierInfoEXT.

The future disagreement this patch attempts to prevent is the
combination of:
    a. VkDrmFormatModifierPropertiesListEXT neglects to return a valid
       modifier because its hard-coded list of modifiers drifts
       out-of-sync with hard-coded lists elsewhere in the code. (Already
       today, the list in get_wsi_format_modifier_properties_list() does
       not match the list in isl_drm.c; though, this has produced no bug
       yet).
    b. vkGetPhysicalDeviceImageFormatProperties2 accepts, via
       VkPhysicalDeviceImageDrmFormatModifierInfoEXT, the modifier
       overlooked in (a), because it does not use the same hard-coded
       list in get_wsi_format_modifier_properties_list(). (Recall that
       the spec requires vkGetPhysicalDeviceImageFormatProperties2 to
       correctly accept/reject any int that the app provides, even when
       the int is an invalid modifier).
    c. The Bug. The driver told the app in (b) that it can legally
       create an image with format+modifier, but the app cannot query
       the VkFormatFeatureFlags of the format+modifier due to (a).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2020-11-17 10:36:45 -08:00
Chad Versace 6835cb7f86 isl: Make public the list of modifiers
This allows Vulkan and GL to iterate over the full list of modifiers
instead of hard-coding in various places the "same" list as isl.

(Anvil's list has already diverged from isl's list. It omits Gen12
modifiers).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2020-11-17 10:36:45 -08:00
Chad Versace 51eefbaae6 anv/image: Fill drmFormatModifierTilingFeatures (v2)
Fill VkDrmFormatModifierPropertiesEXT::drmFormatModifierTilingFeatures
with anv_get_image_format_features().

anv_formats.c:get_wsi_format_modifier_properties_list() incorrectly left
it uninitialized.

v2: Increment drmFormatModifierPlaneCount if modifier support aux.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2020-11-17 10:36:45 -08:00
Chad Versace 1f39b3e48d anv/image: Teach anv_get_image_format_features() about modifiers (v3)
Because anv_get_image_format_features() now understands modifiers, also
relocate most of the modifier compatibility checks from
anv_get_format_plane() into anv_get_image_format_features() in order to
avoid duplication.

The new signature forces some code movement in
anv_get_image_format_properties().

v2:
  - Reject VK_FORMAT_B4G4R4A4_UNORM_PACK16 with modifiers on HSW.
v3:
  - Revert the v2 change.
  - Query isl_format_layout instead of pipe_format. (for jekstrand)
  - Drop misguided comments. (for jekstrand)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
2020-11-17 10:36:44 -08:00
Chad Versace 486ae7c655 isl: Add isl_format_layout::uniform_channel_type
If each format channel has the same base type (such unorm), then that
is the format's "uniform channel type".

Calculating the field at buildtime is probably better than looping over
all channels at runtime each time we wish to query it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2020-11-17 10:36:44 -08:00
Chad Versace f665bae4eb anv/image: Use isl_drm_modifier_get_score()
It replaces anv_drm_format_mod_score().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2020-11-17 10:36:44 -08:00
Chad Versace 01bad67a94 isl: Define isl_drm_modifier_get_score() [v3]
Return the modifier's score, which indicates the driver's preference for the
modifier relative to others. A higher score is better. Zero means
unsupported.

Intended to assist selection of a modifier from an externally provided list,
such as VkImageDrmFormatModifierListCreateInfoEXT.

v2:
  - Rename anv_drm_format_mod_score to isl_drm_modifier_get_score.
  - Squash all incremental changes to anv_drm_format_mod_score.
v3:
  - Drop redundant 'unlikely'. (for nchery)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
2020-11-17 10:36:44 -08:00
Chad Versace b50275a4b6 anv/image: Fix isl_surf_usage_flags for stencil images
Respect VkImageStencilUsageCreateInfoEXT.

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2020-11-17 10:36:44 -08:00
Chad Versace 51a19c83b0 anv/image: Check DISJOINT in vkGetPhysicalDeviceImageFormatProperties2 (v2)
The code did not return error when VK_IMAGE_CREATE_DISJOINT_BIT was
incompatible with the other input params.

If the Vulkan spec forbids a set of input params for vkCreateImage,
but permits them for vkGetPhysicalDeviceImageFormatProperties2,
then vkGetPhysicalDeviceImageFormatProperties2 must reject those input
params with failure.

- v2: Clearer commit message.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2020-11-17 10:36:44 -08:00
Erik Faye-Lund 19906022e2 zink: more accurately track supported blits
We don't care if blits need to respect render-conditions if there's no
active one. So let's hit the potentially faster native blit-paths
instead.

Fixes: 5743fa6e70 ("zink: enable conditional rendering if available")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3792
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7606>
2020-11-17 16:46:40 +01:00
Erik Faye-Lund 465a48a048 zink: always insert barriers for general-layout
We need to always have barriers between individual uses of resources
in the general-layout, because otherwise a write-cache might not be
flushed before the resource is used.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7652>
2020-11-17 15:05:46 +00:00
Erik Faye-Lund 11ebe2a572 zink: mark general layout as transfer-read/write
The general layout can be used for transfers, so we need to make sure
the vulkan driver knows. This will help the driver know when it needs to
flush caches.

While we're at it, also add shader-read, which is another access we use.
We should stop using that one ASAP, but for now this seems like the
right thing to do.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7652>
2020-11-17 15:05:46 +00:00
Iago Toral Quiroga 249aed1ff0 v3dv: rename playout and dslayout fields to use underscores.
Following a suggestion from Alejandro, since playout is a word on its own
and can be confusing. It also makes it more consistent with other
variable names that use an underscore.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
2020-11-17 12:12:45 +01:00
Iago Toral Quiroga ba2e979b5c v3dv: blit shader clean-ups
This avoids redundant per-layer operations that are the same across
layers or that only need to do once. Namely:

- The sampler for the blit source is the same for all layers.
- The decision about whether we need to load TLB contents or not only
  needs to be done once.
- Some command buffer state such as the pipeline, the viewport and the
  scissor is the same for all layers and should only be bound once.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
2020-11-17 12:07:15 +01:00
Iago Toral Quiroga 840ba2513a v3dv: initialize pipeline layouts for meta operations at driver initialization
This removes the need to lock just to check if we have created them
due to the lazy allocation strategy we had in place.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
2020-11-17 12:07:15 +01:00
Iago Toral Quiroga ba69c36ada v3dv: add a buffer to image copy path using a texel buffer
This is much faster than the blit fallback (which requires to upload
the linear buffer to a tiled image) and the CPU path.

A simple stress test involving 100 buffer to image copies of a
single layer image with 10 mipmap levels provides the following
results:

Path           | Recording Time | Execution Time |
-------------------------------------------------|
Texel Buffer   |     2.954s      |     0.137s    |
-------------------------------------------------|
Blit           |    10.732s      |     0.148s    |
-------------------------------------------------|
CPU            |     0.002s      |     1.453s    |
-------------------------------------------------|

So generally speaking, this texel buffer copy path is the fastest
of the paths that can do partial copies, however, the CPU path might
provide better results in cases where command buffer recording is
important to overall performance. This is probably the reason why
the CPU path seems to provide slightly better results for vkQuake2.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
2020-11-17 12:04:42 +01:00
Iago Toral Quiroga 6304c08818 v3dv: fix width for buffer view texture state
This is in units of texels, not bytes.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
2020-11-17 12:03:56 +01:00
Samuel Pitoiset d25d097d3d radv: don't subtract max_verts_per_prim from hw_max_esverts on gfx10.3
Ported from RadeonSI.

GFX10.3 does it properly.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7566>
2020-11-17 10:34:28 +00:00
Samuel Pitoiset f777d00a75 radv: don't count unusable vertices to the NGG LDS size
Ported from RadeonSI.

To get optimal LDS usage since the previous change.

Cc: 20.2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7566>
2020-11-17 10:34:28 +00:00
Samuel Pitoiset c5e8f6700b radv: fix applying the NGG minimum vertex count requirement
Ported from RadeonSI.

The restriction was applied too late.

Cc: 20.2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7566>
2020-11-17 10:34:28 +00:00
Samuel Pitoiset 0790105f2f radv: do VGT_FLUSH when switching NGG -> legacy on Sienna Cichlid
Ported from RadeonSI.

Cc: 20.2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7566>
2020-11-17 10:34:28 +00:00
Pierre-Eric Pelloux-Prayer 68f152cb9a mesa/gallium: add MESA_MAP_ONCE / PIPE_MAP_ONCE
If set, this bit tells the driver that the buffer will only be
mapped once.

radeonsi uses it to disable its "never unmap buffers" optimisations.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3660
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7428>
2020-11-17 10:53:06 +01:00
James Park addfe49fdd radv: Fix radv_queue_init failure handling
Do not destroy pending_mutex or thread_mutex if uninitialized.

Do not use or destroy thread_cond if uninitialized.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7599>
2020-11-17 09:40:54 +00:00
Boris Brezillon aaecb65b89 panfrost: Don't expose fp16 support on Bifrost unless explicitly requested
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon fee4e991fe pan/bi: Stop extracting the immediate attribute index from src0
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon 549a59f66e pan/bi: Add a varying_index field to bi_texture
So we can get rid of the offset adjusment done in pack_variant()

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon fb01328d30 pan/bi: Fix LD_VAR with non-constant index
src0 and src1 were mixed leading to invalid varying indices. In order to
fix that properly, we first extend load_vary to pass the immediate index
through a dedicated field and add a special boolean. This way, we don't
have to make sure src0 always contains the index, and can instead match
the src numbering defined in ISA.xml.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon d86973d92a pan/bi: Stop passing special varying names through src0
It's just clearer to have dedicated fields encoding the fact that the
LD_VAR should be SPECIAL, and another field storing the special var id.

With this change, the source index know matches the ISA.xml definition.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon 1176cc1297 pan/bi: Pass LD_VAR update mode explicitly
Let the compiler pass the update mode instead of inferring from the
constant value.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon 4321b4fc93 pan/bi: Move LD_VAR packing out of bi_pack_add()
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon 058bcf4406 pan/bi: Set roundmode to RTZ for f2u operations
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon 00a6a9bdf8 pan/bi: Let the GPU pick the right format based on the varying descriptor
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon aa2156f949 pan/bi: Support automatic register format
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon d0cd8bf2a5 pan/bi: Support txs operations
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Boris Brezillon 045ae54343 pan/bi: Don't use TEXS for tex operations with a src that's not lod or coord
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
2020-11-17 08:41:05 +01:00
Icecream95 5ad9f95f24 pan/mdg: Try demoting uniforms instead of spilling to TLS
mir_estimate_pressure often underestimates the register pressure,
letting too many registers be used for uniforms, causing RA to fail.

Mitigate this by demoting some uniforms back to explicit loads to free
up work registers if register allocation fails.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7616>
2020-11-17 03:33:51 +00:00
Vinson Lee 69cad1f96e turnip: Close sync_fd only if it is a valid file descriptor.
Fix defects reported by Coverity Scan.

Argument cannot be negative (NEGATIVE_RETURNS)
negative_returns: sync_fd is passed to a parameter that cannot be negative.

Fixes: cec0bc73e5 ("turnip: rework fences to use syncobjs")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7647>
2020-11-17 01:05:44 +00:00
Vinson Lee 71ee4e2853 clover/spirv: Add missing break for SpvOpExecutionMode case.
Fix defect reported by Coverity Scan.

Missing break in switch (MISSING_BREAK)
unterminated_case: The case for value SpvOpExecutionMode is not
terminated by a 'break' statement.

Fixes: ee5b46fcfd ("clover/spirv: support CL_KERNEL_COMPILE_WORK_GROUP_SIZE")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7519>
2020-11-17 00:15:51 +00:00
Vinson Lee 7820c8c13f frontends/va: Fix *num_entrypoints check.
Fix defect reported by Coverity Scan.

Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking num_entrypoints suggests that it
may be null, but it has already been dereferenced on all paths
leading to the check.

Fixes: 5bcaa1b9e9 ("st/va: add encode entrypoint v2")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7360>
2020-11-16 16:04:28 -08:00
Eric Anholt 1beb477908 freedreno: Disable PIPE_CAP_PREFER_IMM_ARRAYS_AS_CONSTBUF.
We now have NIR opt_large_constants support in place, so we can flip the
switch and get better optimization before lowering to a constant buffer,
but also avoid having constant data mixed in with the shader's uniforms,
which should lower CPU overhead on affected shaders.

Only a few shaders are affected (<.01% impact across shader-db), but for
those the impact is pretty big:

instructions in affected programs: 748 -> 639 (-14.57%)
nops in affected programs: 364 -> 284 (-21.98%)
non-nops in affected programs: 384 -> 355 (-7.55%)
mov in affected programs: 47 -> 27 (-42.55%)
cov in affected programs: 9 -> 6 (-33.33%)
dwords in affected programs: 932 -> 836 (-10.30%)
full in affected programs: 13 -> 14 (7.69%)
constlen in affected programs: 140 -> 64 (-54.29%)
(ss) in affected programs: 14 -> 15 (7.14%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
2020-11-16 13:55:41 -08:00
Eric Anholt 1f44053301 freedreno+turnip: Upload large shader constants as a UBO.
Right now if the shader indirects on some large constant array, we see NIR
load_consts (usually from the const file) of its contents into general
registers, then indirection on the GPRs.  This often results in register
allocation failures, as it's easy to go beyond the ~256 dwords of
registers per invocation.

By moving the large constants to a UBO, we can load an arbitrary number of
them.  They also can be theoretically moved to the constant reg file (~2k
dwords), though you're unlikely to hit this path without an indirect load
on your large constant, and we don't yet let UBO indirect loads get moved
to constant regs.

This possibly won't work out right if we have 16-bit load_constants, but
without other MRs in flight we won't see 16-bit temps to be lowered to
this.

This allows 2 kerbal-space-program shaders to compile that previously
would fail, and fixes the new dEQP-VK and -GLES2 tests I wrote that
dynamically index a 40-element temporary array of float/vec2/vec3/vec4
with constant element initializers.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2789
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
2020-11-16 13:55:41 -08:00
Eric Anholt 17db969f7a freedreno/ir3: Fix incorrect optimization of usage of 16-bit constbuf vals.
If you're loading a 32b word from the const file and doing a cov.u32u16
split to two 16bit values, we can't turn that into a reference of a 16-bit
float value directly from the constbuf, because the
CONSTANT_DEMOTION_ENABLE results in a f2f16 operation on the 32-bit value
that we didn't want.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
2020-11-16 13:54:22 -08:00
Eric Anholt 386998cfbf freedreno/ir3: Switch emit_const_ptrs() to take BOs instead of prscs.
Just indirect in the caller, which means that I'll be able to pass a
non-resource BO in the large-constants case.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
2020-11-16 13:54:22 -08:00
Eric Anholt a9b37e5dad freedreno/ir3: Include at least 4 NOPs so that cffdump doesn't disasm junk.
cffdump looks at the following 4 instructions to decide if the shader has
*really* ended, so if we pack data after that (such as turnip's next
stage's shader), it might decode instructions that aren't really part of
the shader.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
2020-11-16 13:54:22 -08:00
Eric Anholt 51f2b11b04 nir: Add a size_align helper function for aligning elements to 16 bytes.
This is useful for freedreno's intrinsic opt_large_constant lowering,
where we want arrays and struct elements aligned to 16 to avoid generating
lots of extra instructions to extract from the right component.

Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
2020-11-16 13:54:22 -08:00
Eric Anholt 433841d9eb freedreno: Fix leak of shader binary on disk cache hits.
It's supposed to be ralloced -- there's not even a shader variant destroy
function for freeing, just ralloc_free() on the ir3_shader_variant or the
parent ir3_shader when you're done!

Fixes: f97acb4bb4 ("freedreno/ir3: disk-cache support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
2020-11-16 13:54:22 -08:00
Caio Marcelo de Oliveira Filho b3daf341d4 intel/fs: Add assert on the brw_STAGE_prog_data downcasts
Motivation is to detect earlier certain bugs that can occur when
missing a check for the stage before using the downcast.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7540>
2020-11-16 12:40:59 -09:00