Commit Graph

1043 Commits

Author SHA1 Message Date
Hyunjun Ko 4b8f4bae01 tu: allow dynamic primitive topology with tessellation
This allows to set VK_PRIMITIVE_TOPOLOGY_PATCH_LIST dynamically when
tessellation used.

If other values are set via vkCmdSetPrimitiveTopologyEXT for the case,
the validation layer can detect the issue.

Fixes dEQP-VK.pipeline.extended_dynamic_state.*.topology_patch*

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12299>
2021-08-12 01:37:01 +00:00
Danylo Piliaiev 4f9ac2f737 tu: add "flushall" and "syncdraw" debug options
They will be useful to check whether some issue is due to the lack
of flushing or waiting.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12283>
2021-08-10 20:08:58 +00:00
Connor Abbott 380d4904ea tu: Read some input attachments directly
It can happen that the user reads an input attachment as the first use
of that attachment. In that case there are no subpass dependencies
required at all, because there could be a pipeline barrier before the
renderpass instead, and in any case we assume that dependencies with the
first subpass as a destination can be executed only once outside the
renderpass. The result is that we only do a CACHE_INVALIDATE once
before the entire renderpass, but it's actually required after each GMEM
load, because input attachments read GMEM through UCHE and those writes
to GMEM invalidate UCHE.

While we could add the missing CACHE_INVALIDATE "by hand" somehow, it
turns out it's actually just as easy to do an optimization the blob
does, where it simply doesn't patch those input attachments and reads
them directly instead. This means we can skip allocating memory in GMEM
for them entirely in some circumstances.

This fixes e.g.
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image.4_bit
with TU_DEBUG=forcebin.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12213>
2021-08-10 16:45:53 +02:00
Rob Clark 4e28dfe58e freedreno: Device matching based on chip_id
Add support for device matching based on chip_id instead of gpu_id, to
handle newer GPUs

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12159>
2021-08-06 18:51:50 +00:00
Rob Clark 7806843866 freedreno/all: Introduce fd_dev_id
Move away from using gpu_id as the primary means to identify which
adreno we are running on, as future GPUs (starting with 7c3) stop
providing a gpu_id as a new naming scheme is introduced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12159>
2021-08-06 18:51:50 +00:00
Matt Turner 0165fde82c tu: Raise maxDescriptorSetUpdateAfterBindUniformBuffersDynamic to 16
... and reduce maxDescriptorSetUpdateAfterBindStorageBuffersDynamic from 12 to
8.

MAX_DYNAMIC_BUFFERS is MAX_DYNAMIC_UNIFORM_BUFFERS +
MAX_DYNAMIC_STORAGE_BUFFERS. We set

maxDescriptorSetUniformBuffersDynamic = MAX_DYNAMIC_UNIFORM_BUFFERS
maxDescriptorSetStorageBuffersDynamic = MAX_DYNAMIC_STORAGE_BUFFERS
maxDescriptorSetUpdateAfterBindUniformBuffersDynamic = MAX_DYNAMIC_BUFFERS / 2
maxDescriptorSetUpdateAfterBindStorageBuffersDynamic = MAX_DYNAMIC_BUFFERS / 2

The CTS test checks that

maxDescriptorSetUpdateAfterBindUniformBuffersDynamic
	- is at least 8; and
	- is at least maxDescriptorSetUniformBuffersDynamic
maxDescriptorSetUpdateAfterBindStorageBuffersDynamic
	- is at least 4; and
	- and is at least maxDescriptorSetStorageBuffersDynamic

Prior to this patch maxDescriptorSetUpdateAfterBindUniformBuffersDynamic was 12
but maxDescriptorSetUniformBuffersDynamic was 16, thus causing the CTS failure
in
  dEQP-VK.api.info.vulkan1p2_limits_validation.ext_descriptor_indexing

By raising maxDescriptorSetUpdateAfterBindUniformBuffersDynamic to the same
value as maxDescriptorSetUniformBuffersDynamic, we bring the limits into the
appropriate ranges. We do the same thing for
maxDescriptorSetUpdateAfterBindStorageBuffersDynamic by assigning it the same
value as maxDescriptorSetStorageBuffersDynamic.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12193>
2021-08-06 16:46:55 +00:00
Connor Abbott 2e2e6865b4 tu, freedreno/a6xx: Fix setting PC_XS_OUT_CNTL::PRIMITVE_ID
This is supposed to be set when that stage needs the PrimID sysval
preloaded, except for the VS which doesn't have this bit and instead
infers it from the HS or GS bit (depending on whether tess/GS is
enabled). Therefore for HS, GS, and DS we should set it whenever the
corresponding sysval is there. This includes adding a missing
PC_HS_OUT_CNTL, which I confirmed is set when the HS reads PrimID from
the VS. Note that the DS sysval is currently always enabled whenever
there's a GS, if we were to fix that then we should also change the
logic here.

This doesn't fix anything that I know of, but aligns us more with what
the blob does.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
2021-08-05 16:35:41 +00:00
Connor Abbott 8115cde3ba tu, freedreno/a6xx, ir3: Rewrite tess PrimID handling
The previous handling conflated RelPatchID and PrimID, which would
result in incorrect gl_PrimitiveID when doing draw splitting and didn't
work with PrimID passthrough which fills the VPC slot with the "correct"
PrimID value from the tess factor BO which we left 0. Replace PrimID in
the tess lowering pass with a new RelPatchID sysval, and relace PrimID
with RelPatchID in the VS input code in turnip/freedreno at the same
time so that there is no net change in the tess lowering code. However,
now we have to add new mechanisms for getting the user-level PrimID:

- In the TCS it comes from the VS, just like gl_PrimitiveIDIn in the GS.
  This means we have to add another register to our VS->TCS ABI. I
  decided to put PrimID in r0.z, after the TCS header and RelPatchID,
  because it might not be read in the TCS.
- If any stage after the TCS uses PrimID, the TCS stores it in the first
  dword of the tess factor BO, and it is read by the fixed-function
  tessellator and accessed in the TES via the newly-uncovered DSPRIMID
  field. If we have tess and GS, the TES passes this value through to
  the GS in the same way as the VS does. PrimID passthrough for reading
  it in the FS when there's tess but no GS also "just works" once we
  start storing it in the TCS. In particular this fixes
  dEQP-VK.pipeline.misc.primitive_id_from_tess which tests exactly that.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
2021-08-05 16:35:41 +00:00
Connor Abbott cd687c4e3b freedreno: Rename and document tess primid-related sysvals
DSPATCHID and HSPATCHID, which we mapped gl_PrimitiveID to, are actually
relative to the current subdraw. Subdraws aren't supported yet by turnip
but they are by freedreno for indirect draws.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
2021-08-05 16:35:40 +00:00
Danylo Piliaiev 97b0981ed9 tu: disable gmem in primary cmdbuffer if secondary has it disabled
If secondary command buffer is emitted within a subpass it may have
barriers which forces us to disable gmem for current renderpass.

Fixes: 20547a110e "tu: delay decision of forcing sysmem due to subpass self-dependencies"

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12219>
2021-08-05 18:51:51 +03:00
Emma Anholt c2a6143755 turnip: Fix assertions on checking mutable combined samplers support.
We would determine that it was unsupported, then ask for the size and
triggered the assertion checking that we never ask for the size of a
combined sampler.

Fixes: ee3495e465 ("turnip: Add support for VK_VALVE_mutable_descriptor_type")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12148>
2021-08-02 17:41:54 +00:00
Danylo Piliaiev ea7a42775b turnip: reduce maxComputeWorkGroupSize
Blob advertises { 1024, 1024, 64 }, but from tests they all
could be 1024.

Fixes tests:
 dEQP-VK.compute.basic.max_local_size_x
 dEQP-VK.compute.basic.max_local_size_y
 dEQP-VK.compute.basic.max_local_size_z

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9409>
2021-08-02 20:19:18 +03:00
Connor Abbott b157a5d0d6 tu: Implement non-aligned multisample GMEM STORE_OP_STORE
We have to a bit careful here when disabling draw states. This also
necessitates moving the actual recording of the stores to the end so
that we set the dirty flag correctly.

Closes: #4462
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12102>
2021-08-02 11:20:25 +00:00
Connor Abbott 7948c4b0b4 tu: Make tile stores use a dedicated CS
We were trying to calculate how much space they need, That was already
difficult and one of the most opaque and hard-to-verify uses of sub_cs,
but it will become even more difficult with the 3D path. What's worse is
that sometimes we have to touch that path when we start touching
registers that would affect rasterization, and there's no indication
that you have to then recalculate the size etc. Just rip this out and
start keeping a separate CS for it instead. Note that this adds a small
amount of memory wastage and extra buffers (at worst one buffer per
command buffer).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12102>
2021-08-02 11:20:25 +00:00
Connor Abbott d9a4a0aebd tu: Handle multisample vkCmdCopyColorImage()
There was a bit of code already to select the 3d path, but we actually
need another shader variant for it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12080>
2021-07-29 23:54:29 +00:00
Connor Abbott fc0c0e9d45 tu: Use NIR for clear/blit shaders
This is much more maintainable, extensible, and easy to read than
hand-rolled structs approximating assembly. This also removes the last
use of the old hand-written packing structs. There are a few minor
differences:

- The shaders are larger because ir3 currently doesn't support (rpt),
  which means that some shaders are larger than one instrlen and the
  current logic has to be extended to allow for that. This seems a small
  price to pay, ir3 will gain support for (rpt) eventually, and we
  shouldn't have limitations like this baked in anyway. For example some
  GL blob r8g8 <-> r16 copy shaders are apparently quite large.
- Due to the inability to switch inputs/outputs on the fly, we need to
  split the VS into two variants. I made the layer-writing variant also
  used for other clears, because the old method of overloading c0.z/c1.z
  to mean both "src x coordinate" and "z clear value" in the same shader
  seemed too clever and I didn't want to add yet another variant. This
  means that non-layered clears will also write the layer (to 0), but
  that shouldn't be a big deal performance-wise.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12079>
2021-07-29 23:20:18 +00:00
Alejandro Piñeiro 476dc3c050 vulkan: add vk_spec_info_to_nir_spirv util method
All vulkan drivers have been copying anv's code to convert
VkSpecializationInfo into nir_spirv_specialization.

Recently there was a Vulkan spec change on allowed values for
VkSpecializationInfo, and all drivers got affected.

This commits creates a new helper, and uses it on all Vulkan Mesa
drivers.

v2: use (uint8_t*) castings, instead of void*, to avoid C2036 with
    MSVC (detected by the CI, inspired on what radv was doing)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12047>
2021-07-29 03:28:52 +00:00
Danylo Piliaiev 20547a110e tu: delay decision of forcing sysmem due to subpass self-dependencies
DXVK always inserts vertex stage subpass self-dependency for every
subpass regardless of whether there actually would be a barrier.
This effectively disabled gmem rendering with DXVK.

Thus we delay the decision to disable gmem rendering until we
see a barrier with vertex stages.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12038>
2021-07-28 15:35:44 +00:00
Eduardo Lima Mitev ee3495e465 turnip: Add support for VK_VALVE_mutable_descriptor_type
v1.  Hyunjun Ko <zzoon@igalia.com>
- Add to hanlde VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE
- Don't support VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT and
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER

v2.  Hyunjun Ko <zzoon@igalia.com>
- Fix some indentations and nitpicks.
- Add the extension to features.txt

v3.  Hyunjun Ko <zzoon@igalia.com>
- Remove unnecessary asserts.

Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9322>
2021-07-27 04:53:02 +00:00
Danylo Piliaiev b45cddda18 tu: handle half-reg fs outputs
This would allow to enable translation of RelaxedPrecision spirv
variable decorator into mediump which for us means fp16.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12067>
2021-07-26 16:25:14 +00:00
Jason Ekstrand 17f7b4b83e turnip: Replace tu_lower_image_size with nir_lower_image
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12005>
2021-07-22 14:22:35 -05:00
Rob Clark 7c7722304b freedreno+turnip: Get device name from device-info table
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark a4559c9550 freedreno+turnip: Add has_8bpp_ubwc
Newer a6xx devices seem to drop 8b/pixel UBWC support.

The turnip part was adapted from Jonathans patch on !10892

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark e552784e68 freedreno+turnip: Add has_cp_reg_write
Newer a6xx devices drop this packet from the sqe firmware, and use
direct (pkt4) register writes instead for the few cases that previously
used CP_REG_WRITE.

The turnip part was adapted from Jonathans patch on !10892

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark f74d0bf05e turnip: Get has_sample_locations from fd_dev_info
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark 3f1c4a86bb turnip: Get has_tex_filter_cubic from fd_dev_info
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark 64af60cfb3 turnip: Get indirect_draw_wfm_quirk from fd_dev_info
At some point we might want to change this to minimum fw version, but
for now it can be a bool.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark 06000f42ed turnip: Get storage_16bit from fd_dev_info
Removing more gpu_id checks that will become bogus as we add more a6xx.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark 948d87dab8 turnip: Drop unused vshs_workgroup param
Unused since d968995c67, and this gets rid
of one more gpu_id check.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark c9bcd835fa turnip: Convert fd_dev_info to const pointer
Split out from earlier patch to reduce churn.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark 78c8a8af80 freedreno: Generate device-info tables at build time
This way we can make the tables const.  At the same time, for a6xx, this
introduces a "sub-generation template" to reduce the copy/paste for
parameters which are keyed to the sub-generation.  It also explicitly
lists every supported GPU, to get rid of duplicate lists of supported
gpus between the device-info and drivers.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark 0eda0188aa freedreno: Rename *_dev_info
Everywhere else symbols/types/etc are shortend to "fd_*", so lets do the
same here for consistency.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Jonathan Marek 1a6dd7f9b1 freedreno/common: unhardcode CCU color cache offset
Replace it with a calculation which works for all current GPUs.

Duplicated the calculation in both drivers because freedreno_dev_info isn't
meant for derived parameters (and drivers might want to just calculate on
the fly instead).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Jonathan Marek d34b18a6ce tu: remove workaround for conditional rendering + hw binning
- It hurts users with newer firmware who don't need the workaround
- Kernel now rejects older firmware due to security issues, so this will
  prevent users from using older firmware anyway.
- Only whitelisting 650 enables the workaround by default for any new GPUs

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
2021-07-14 01:58:00 +00:00
Rob Clark 4e802538e7 turnip: Split tu6_emit_xs()
Emit all the state layout config (such as push-const CONSTLEN) first,
before emitting anything that depends on that state.  This fixes an
issue that was showing up when FLUT is enabled in ir3 (which results
in higher probability of not having any immediats lowered to push-
consts).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8705>
2021-07-13 14:40:30 +00:00
Rob Clark 71003e3c84 turnip: avoid some UB
Reduce a bit of extra noise that makes diffing cmdstream traces more
annoying.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8705>
2021-07-13 14:40:30 +00:00
Emma Anholt a7e753cb96 turnip: Fix allocation size for vkCmdUpdateBuffer.
tu_cs_alloc() takes a size in dwords, not bytes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11743>
2021-07-12 17:15:56 +00:00
Michel Dänzer 55caa3abb1 turnip: Mark local variable ASSERTED
It's only used in assert. Avoids compiler warning/error with assertions disabled:

../src/freedreno/vulkan/tu_cs.h: In function 'tu_cs_reserve':
../src/freedreno/vulkan/tu_cs.h:208:13: error: unused variable 'result' [-Werror=unused-variable]
  208 |    VkResult result = tu_cs_reserve_space(cs, reserved_size);
      |             ^~~~~~

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11412>
2021-07-09 10:24:41 +00:00
Jason Ekstrand 624e799cc3 nir: Drop nir_ssa_def::name and nir_register::name
We say that they're for debug only but we don't really have a good
policy around when to set them and when not to.  In particular,
nir_lower_system_values and nir_lower_vars_to_ssa which are the chief
producers of SSA values which might reasonably have a name do not bother
to set one.  We have some names set from things like BLORP and RADV's
meta shaders but AFAICT, they're setting a name more because it's there
than because they actually care.

Also, most things other than nir_clone and nir_serialize don't bother to
try and preserve them.  You can see in the diffstat of this commit
exactly what passes attempt to preserve names.  Notably missing from the
list is opt_algebraic which is the single largest source of SSA def
churn and it happily throws names away.

These observations lead me to question whether or not names are actually
useful at all or if they're just taking up space (8B per instruction)
and wasting CPU cycles (to ralloc_strdup on the off chance we do have
one).  I don't think I can think of a single time in recent history
where I've been debugging a shader issue and a SSA value name has been
there and been useful.  If anything, the few times they are there, they
just throw me off because they mess up the indentation in nir_print.

iris shader-db on my system gets runtime -2.07734% +/- 1.26933% (n=5)

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5439>
2021-07-08 17:34:41 +00:00
Connor Abbott 266d3d5814 tu: Update subgroup properties
Everything should be in place for this to actually work. Support a size
of 128, unlike the blob. I've also plumbed through ballot support, so
enable that.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Connor Abbott 68b8b9e9e1 tu, ir3: Plumb through support for CS subgroup size/id
The way that the blob obtains the subgroup id on compute shaders is by
just and'ing gl_LocalInvocationIndex with 63, since it advertizes a
subgroupSize of 64. In order to support VK_EXT_subgroup_size_control and
expose a subgroupSize of 128, we'll have to do something a little more
flexible. Sometimes we have to fall back to a subgroup size of 64 due to
various constraints, and in that case we have to fake a subgroup size of
128 while actually using 64 under the hood, by just pretending that the
upper 64 invocations are all disabled. However when computing the
subgroup id we need to use the "real" subgroup size. For this purpose we
plumb through a driver param which exposes the real subgroup size. If
the user forces a particular subgroup size then we lower
load_subgroup_size in nir_lower_subgroups, otherwise we let it through,
and we assume when translating to ir3 that load_subgroup_size means
"give me the *actual* subgroup size that you decided in RA" and give you
the driver param.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Hyunjun Ko 9507705693 turnip/kgsl: new flag TU_USE_KGSL
There are some cases using kgsl backend on linux that is still not usual
setup though, we need to consider too.

Regarding the timeline semaphore feature, we could implement it for
the kgsl backend in the future, and probalby it should be using the
existing code in tu_drm.

See #4738, #4907

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11488>
2021-07-01 04:22:55 +00:00
Emma Anholt fd5293cc43 turnip: Short-circuit if ladder generation for constant index SSBO/UBOs.
The compiler *can* eventually chew through all the copy prop, constant
folding, and dead_cf necessary to use just our constant index, but we can
save a whole lot of hassle by chasing the MOVs up front and finding the
constant.

dEQP-VK.ubo.3_level_array.scalar.row_major_mat4.both goes from 2.0s to
1.6s on a release build (3.1s to 2.1s for a debug build like we use in CI).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11613>
2021-06-28 16:26:24 +00:00
Rob Clark e74366b18a turnip: Add CrOS Gralloc support
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11612>
2021-06-26 18:44:12 +00:00
Rob Clark f875b61060 turnip: Fix AcquireImageANDROID() handle type
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11612>
2021-06-26 18:44:12 +00:00
Rob Clark 7ca79b7639 turnip: Use drmIoctl()
Replace open-coded ioctl with drmIoctl() to get restart on interrupted
system calls.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11612>
2021-06-26 18:44:12 +00:00
Matt Turner ed77bf3c4e ci: Unify on MESA_VK_IGNORE_CONFORMANCE_WARNING
Move and rename warn_non_conformant_implementation() to common location
of src/vulkan/util/vk_util.c as vk_warn_non_conformant_implementation().

In freedreno/ci,  move MESA_VK_IGNORE_CONFORMANCE_WARNING to common
location of .baremetal-deqp-test-freedreno-vk.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11563>
2021-06-25 19:45:38 +00:00
Connor Abbott d01e7b50b8 freedreno, tu: Set SP_XS_PVT_MEM_HW_STACK_OFFSET
Theoretically this register should only be used when function calls in
the shader are used, which we don't support. But with the default value
of 0 it seems like pvtmem doesn't work on a650. Just set it to the total
per-SP size, effectively leaving no space for the return-address stack,
like the blob does.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4949
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11581>
2021-06-25 15:57:54 +00:00
Danylo Piliaiev a9fd4fa26c turnip: early exit in tu6_draw_common to save cpu cycles
Improves Zink + drawoverhead perf up to 4%

Before:
  1, DrawElements ( 1 VBO| 0 UBO|  0    ) w/ no state change, 3981
  1, DrawElements ( 1 VBO| 0 UBO|  0    ) w/ no state change, 3977

After:
  1, DrawElements ( 1 VBO| 0 UBO|  0    ) w/ no state change, 4136
  1, DrawElements ( 1 VBO| 0 UBO|  0    ) w/ no state change, 4163

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11556>
2021-06-25 13:37:32 +00:00
Danylo Piliaiev 815a85dd7c turnip: do not re-emit same vs params
Improves drawoverhead perf through Zink up to 260%

Before:
  1, DrawElements ( 1 VBO| 0 UBO|  0    ) w/ no state change, 1518
After:
  1, DrawElements ( 1 VBO| 0 UBO|  0    ) w/ no state change, 3981

This brings it close to Freedreno, which has around 4300.

In vkQuake vs params re-emission now occurs in 0.23% of draw calls.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11556>
2021-06-25 13:37:32 +00:00