Can result in different shaders.
Fixes: 8a86908e9a "radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Previously we relied on stores not using DCC but that is going to
change, so disable compression explicitly.
Reviewed-by: Dave Airlie <airlied@redhat.com>
For extra args. Unlike image creation, I'm not embedding the vk
struct in there, so all the inline structs can be kept.
Reviewed-by: Dave Airlie <airlied@redhat.com>
VK spec 7.3:
"Applications must ensure that all accesses to memory that backs
image subresources used as attachments in a given renderpass instance
either happen-before the load operations for those attachments, or
happen-after the store operations for those attachments."
So the only renderloops we can have is with input attachments. Detect
these.
Reviewed-by: Dave Airlie <airlied@redhat.com>
When the application does not ask for robust buffer access.
Only implemented the check in radv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The driver should now rely on cmask_offset because CMASK can be
disabled by the driver for some reasons (eg. mipmaps). Apply the
same change for FMASK, although it should be useless.
Fixes: ad1bc8621d ("radv: remove radv_get_image_fmask_info()")
Fixes: 10d08da52c ("radv/gfx10: add missing dcc_tile_swizzle tweak")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's unnecessary to duplicate fields in another struct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's unnecessary to duplicate fields in another struct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It can be enabled with RADV_PERFTEST=gewave32.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It can be enabled with RADV_PERFTEST=pswave32.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It can be enabled with RADV_PERFTEST=cswave32.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This patch decouples radv_shader.h from any LLVM dependency.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Enabling tracing, and then having a vmfault, can leads to a segfault
before we print out the traces, as if a meta shader is executing
and we don't have the NIR for it.
Just pass the stage and give back a default.
Fixes: 9b9ccee4d6 ("radv: take LDS into account for compute shader occupancy stats")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Will be useful for testing the legacy path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It should be possible to build it on-demand too but it requires
more work. On GFX10, the GS copy shader is required when tess
is enabled with extreme geometry.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This needs to be cleaned up a bit, and it probably contains
missing stuff and/or bugs.
This doesn't fix the "half of the triangles" issue.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This simplifies a bunch of stuff by
(1) Keeping all the things in a single allocation, making things easier
for the cache.
(2) creating a shader_variant creation helper.
This is immediately put to use by creating rtld shader binaries. This
is the main reason for the binaries, as we need to do the linking at
upload time, i.e. post caching. We do not enable rtld yet.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
SMEM and VMEM caches are L0 on gfx10. Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Ported from RadeonSI, will be emitted for GFX10 too.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reduces the size of fill operations needed to clear CMASK
for layered color textures.
GFX9 unsupported for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reduces the size of fill operations needed to clear FMASK
for layered color textures.
GFX9 unsupported for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
HTILE decompressions need the user sample locations if specified
in the current subpass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This path supports layers but it requires to decompress HTILE
before resolving. The driver also needs to fixup HTILE after
the resolve. This path is probably slower than the graphics one.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When using graphics, the driver doesn't need to decompress HTILE
before resolving. This path currently doesn't support layers
so we have to fallback to the compute path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only supported with vkCreateRenderPass2().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows us to disable the FMASK decompress pass when
transitioning from CB writes to shader reads.
This will likely be improved and enabled by default in the future.
No CTS regressions on GFX8 but a few number of multisample CTS
failures on GFX9 (they look related to the small hint).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise the buffer loads/stores in the bufimage meta operations fail.
If we decompress DCC then we can use the "canonical" format compatible
with the not-supported format.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
From the Vulkan spec 1.1.109:
"Some implementations may need to evaluate depth image values
while performing image layout transitions. To accommodate this,
instances of the VkSampleLocationsInfoEXT structure can be
specified for each situation where an explicit or automatic
layout transition has to take place. [...] and
VkRenderPassSampleLocationsBeginInfoEXT can be chained from
VkRenderPassBeginInfo to provide sample locations for layout
transitions performed implicitly by a render pass instance."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Basically, this extension allows applications to use custom
sample locations. It doesn't support variable sample locations
during subpass. Note that we don't have to upload the user
sample locations because the spec doesn't allow this.
The extension is currently disabled because the driver needs to
support variable sample locations during layout transitions. The
depth decompress needs to know them and that's a bit invasive.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Make sure to sync all previous work if the given command buffer
has pending active queries. Otherwise the GPU might write queries
data after the reset operation.
This fixes a bunch of new dEQP-VK.query_pool.* CTS failures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The driver only supports up to 8 samples.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.
It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
Other types like syncobj do not need it, so lets make things a bit more uniform.
Also reduce confusion what the signalled/submitted referred to (especially with
imported fences)
Reviewed-by: Dave Airlie <airlied@redhat.com>
According to RadeonSI, this seems to be required by the hardware
to avoid GPU hangs. I think I just forgot to set that bit when I
implemented VK_EXT_transform_feedback.
This fixes a GPU hang with Space Engineers and DXVK.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110291
Fixes: b4eb029062 ("radv: implement VK_EXT_transform_feedback")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
No functional changes. This temporarily uses plane 0 for
everything.
Long term plan is that only single plane images get to use
metadata like htile/dcc/cmask/fmask.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This includes 0 options.
The cache parsing is located at a position where we can easily add
config filtering by VkApplicationInfo.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Basically just reserve the memory in the descriptor sets.
On the shader side we construct a buffer descriptor, since
AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken.
This fully supports update after bind and variable descriptor set
sizes. However, the limits are somewhat arbitrary and are mostly
about finding a reasonable division of a 2 GiB max memory size over
the set.
v2: - rebased on top of master (Samuel)
- remove the loading resources rework (Samuel)
- only load UBO descriptors if it's a pointer (Samuel)
- use LLVMBuildPtrToInt to avoid IR failures (Samuel)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
Does what it says on the tin.
The per stage time is only an approximation due to linking and
the Vega merged stages.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
They are required for using typed buffer loads.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The formats will be used for reducing the number of loaded channels.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes a critical issue.
Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109575
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is needed in order to inline some push constants when possible.
This also adds a new helper for initializing the pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reworks how the depth stencil attachment is used for
simplicity. This also introduces radv_render_pass_compile()
helper that will be used for further optimizations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of doing them in radv_cmd_buffer_set_subpass().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of doing every time we emit cache flushes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This parameter is actually useless as the immediate value
can always be zero.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's common in some applications to bind a new graphics pipeline without
ending up changing any context registers.
This has a pipline have two command buffers: one for setting context
registers and one for everything else. The context register command buffer
is only emitted if it differs from the previous pipeline's.
v2: ensure late scissor emission is done when radv_emit_rbplus_state() is
called
v2: make use of cmd_buffer->state.workaround_scissor_bug
v3: rename "workaround_scissor_bug" to
"context_roll_without_scissor_emitted"
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: rename "workaround_scissor_bug" to
"context_roll_without_scissor_emitted"
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
unused and gcc complains about strncpy. (from what I can see because
strncpy does not leave a 0 byte on truncate. That said we don't use
it so this does not fix a real bug).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The value depends on the number of samples.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's somehow similar to the FCE predicate.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
User are encouraged to switch to LLVM 7.0 released in September 2018.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After investigating on this, it appears that COND_WRITE doesn't
work correctly in some situations. I don't know exactly why does
it fail to update DB_Z_INFO.ZRANGE_PRECISION, but as AMDVLK
also uses COND_EXEC I think there is a reason.
Now the driver stores a new metadata value in order to reflect
the last fast depth clear state. If a TC-compat HTILE is fast cleared
with 0.0f, we have to update ZRANGE_PRECISION to 0 in order to
work around that hardware bug.
This fixes rendering issues with The Forest and DXVK and doesn't
seem to introduce any regressions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108914
Fixes: 68dead112e ("radv: update the ZRANGE_PRECISION value for the TC-compat bug")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows to fast clear the depth part (or the stencil part)
of a depth+stencil surface when HTILE is enabled. I didn't test
on GFX8, so it's disabled currently.
This gives a very nice boost, for example when clearing the depth
aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).
BEFORE: 235 us
AFTER: 13 us
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This implementation should work and potential bugs can be
fixed during the release candidates window anyway.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This should address the remaining failures in Batman Arkhman City.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This should fix rendering issues with Batman Arkham City.
We will probably need to implement itob and itoi at some
point, but currently nothing hits these paths.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes crashes for some CTS:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.linear_*_*
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.*_linear_*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108113
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
According to my benchmark results, it appears that we should
reduce the threshold to 1024.
BEFORE:
1 KB: 68.656000 ms
2 KB: 118.368000 ms
AFTER:
1 KB: 31.760000 ms
2 KB: 29.840000 ms
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It shouldn't be needed to emit the initial graphics or compute
state when beginning a new command buffer. Emitting them in
the preamble should be enough and this will reduce IB sizes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In environments where we cannot cache, e.g. Android (no homedir),
ChromeOS (readonly rootfs) or sandboxes (cannot open cache), the
startup cost of creating a device in radv is rather high, due
to compiling all possible built-in pipelines up front. This meant
depending on the CPU a 1-4 sec cost of creating a Device.
For CTS this cost is unacceptable, and likely for starting random
apps too.
So if there is no cache, with this patch radv will compile shaders
on demand. Once there is a cache from the first run, even if
incomplete, the driver knows that it can likely write the cache
and precompiles everything.
Note that I did not switch the buffer and itob/btoi compute pipelines
to on-demand, since you cannot really do anything in Vulkan without
them and there are only a few.
This reduces the CTS runtime for the no caches scenario on my
threadripper from 32 minutes to 8 minutes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The goal is to use radv_barrier()/radv_subpass_barrier() as
much as possible for further optimizations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Inherited commands buffers are not supported.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
By default, our internal rendering commands are discarded
only if the predicate is non-zero (ie. DRAW_VISIBLE). But
VK_EXT_conditional_rendering also allows to discard commands
when the predicate is zero, which means we have to use a
different flag.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't want to waste CPU cycles for nothing.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A ZPASS_DONE or PIXEL_STAT_DUMP_EVENT (of the DB occlusion
counters) must immediately precede every timestamp event to
prevent a GPU hang on GFX9.
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Needed for VK_KHR_create_renderpass2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This might fix some synchronization issues. I don't know if
that will affect performance but it's required for correctness.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the given image doesn't enable CMASK, FMASK or DCC that's
useless to flush CB metadata.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This ports radv to the shared code, however due to a bug in LLVM
version prior to 7, radv cannot add target info at this stage,
as it would leak one for every shader compile, however I'd prefer
to keep this llvm damage in the shared code, since it isn't the
driver at fault here. We just add a flag to denote if the driver
can support leaking the target info or not, and the common code
does the right thing depending on the llvm version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is prep work for moving this to a per-thread struct
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether
the descriptors will be used by graphics pipelines or compute
pipelines. There is a separate set of bind points for each of
graphics and compute, so binding one does not disturb the other."
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Ported from RadeonSI.
This appears to fix some random fails with:
dEQP-VK.query_pool.statistics_query.*
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2:
Rework fence integration into the driver so that waiting for
any of a mixture of fence types (wsi, driver or syncobjs)
causes the driver to poll, while a list of just syncobjs or
just driver fences will block. When we get syncobjs for wsi
fences, we'll adapt to use them.
v3: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v4: Adapt to WSI fence API change. It now returns VkResult and
no longer has an option for relative timeouts.
v5: wsi_register_display_event and wsi_register_device_event now
use the default allocator when NULL is provided, so remove the
computation of 'alloc' here.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Having random data in there is probably not the best.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This adds support for the KHR_display extension to the radv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.
v2:
* Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v3:
Adapt to new wsi_device_init API (added display_fd)
v4:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace
between types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v5:
Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This adds support for the KHR_display extension support to the vulkan
WSI layer. Driver support will be added separately.
v2:
* fix double ;; in wsi_common_display.c
* Move mode list from wsi_display to wsi_display_connector
* Fix scope for wsi_display_mode andwsi_display_connector
allocs
* Switch all allocations to vk_zalloc instead of vk_alloc.
* Fix DRM failure in
wsi_display_get_physical_device_display_properties
When DRM fails, or when we don't have a master fd
(presumably due to application errors), just return 0
properties from this function, which is at least a valid
response.
* Use vk_outarray for all property queries
This is a bit less error-prone than open-coding the same
stuff.
* Remove VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR from surface caps
Until we have multi-plane support, we shouldn't pretend to
have any multi-plane semantics, even if undefined.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
* Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v3:
Add separate 'display_fd' and 'render_fd' arguments to
wsi_device_init API. This allows drivers to use different FDs
for the different aspects of the device.
Use largest mode as display size when no preferred mode.
If the display doesn't provide a preferred mode, we'll assume
that the largest supported mode is the "physical size" of the
device and report that.
v4:
Make wsi_image_state enumeration values uppercase.
Follow more common mesa conventions.
Remove 'render_fd' from wsi_device_init API. The
wsi_common_display code doesn't use this fd at all, so stop
passing it in. This avoids any potential confusion over which
fd to use when creating display-relative object handles.
Remove call to wsi_create_prime_image which would never have
been reached as the necessary condition (use_prime_blit) is
never set.
whitespace cleanups in wsi_common_display.c
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Add depth/bpp info to available surface formats. Instead of
hard-coding depth 24 bpp 32 in the drmModeAddFB call, use the
requested format to find suitable values.
Destroy kernel buffers and FBs when swapchain is destroyed. We
were leaking both of these kernel objects across swapchain
destruction.
Note that wsi_display_wait_for_event waits for anything to
happen. wsi_display_wait_for_event is simply a yield so that
the caller can then check to see if the desired state change
has occurred.
Record swapchain failures in chain for later return. If some
asynchronous swapchain activity fails, we need to tell the
application eventually. Record the failure in the swapchain
and report it at the next acquire_next_image or queue_present
call.
Fix error returns from wsi_display_setup_connector. If a
malloc failed, then the result should be
VK_ERROR_OUT_OF_HOST_MEMORY. Otherwise, the associated ioctl
failed and we're either VT switched away, or our lease has
been revoked, in which case we should return
VK_ERROR_OUT_OF_DATE_KHR.
Make sure both sides of if/else brace use matches
Note that we assume drmModeSetCrtc is synchronous. Add a
comment explaining why we can idle any previous displayed
image as soon as the mode set returns.
Note that EACCES from drmModePageFlip means VT inactive. When
vt switched away drmModePageFlip returns EACCES. Poll once a
second waiting until we get some other return value back.
Clean up after alloc failure in
wsi_display_surface_create_swapchain. Destroy any created
images, free the swapchain.
Remove physical_device from wsi_display_init_wsi. We never
need this value, so remove it from the API and from the
internal wsi_display structure.
Use drmModeAddFB2 in wsi_display_image_init. This takes a drm
format instead of depth/bpp, which provides more control over
the format of the data.
v5:
Set the 'currentStackIndex' member of the
VkDisplayPlanePropertiesKHR record to zero, instead of
indexing across all displays. This value is the stack depth of
the plane within an individual display, and as the current
code supports only a single plane per display, should be set
to zero for all elements
Discovered-by: David Mao <David.Mao@amd.com>
v6:
Remove 'platform_display' bits from the build and use the
existing 'platform_drm' instead.
v7:
Ensure VK_ICD_WSI_PLATFORM_MAX is large enough by
setting to VK_ICD_WSI_PLATFORM_DISPLAY + 1
v8:
Simplify wsi_device_init failure from wsi_display_init_wsi
by using the same pattern as the other wsi layers.
Adopt Jason Ekstrand's white space and variable declaration
suggestions. Declare variables at first use, eliminate extra
whitespace between types and names, add list iterator helpers,
switch to lower-case list_ macros.
Respond to Jason's April 8 review:
* Create a function to convert relative to absolute timeouts
to catch overflow issues in one place
* use VK_NULL_HANDLE to clear prop->currentDisplay
* Get rid of available_present_modes array.
* return OUT_OF_DATE_KHR when display_queue_next called after
display has been released.
* Make errors from mode setting fatal in display_queue_next
* Remove duplicate pthread_mutex_init call
* Add wsi_init_pthread_cond_monotonic helper function to
isolate pthread error handling from wsi_display_init_wsi
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v9:
Fix vscan handling by using MAX2(vscan, 1) everywhere. Vscan
can be zero anywhere, which is treated the same as 1.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v10:
Respond to Vulkan CTS failures.
1. Initialize planeReorderPossible in display_properties code
2. Only report connected displays in
get_display_plane_supported_displays
3. Return VK_ERROR_OUT_OF_HOST_MEMORY when pthread cond
initialization fails.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
4. Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
And replace _regs by _metadata because it makes more sense.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
And replace _regs by _metadata because it makes more sense.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This was being handled in a few different places, consolidate it into a
single radv_get_shader() function.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This adds a RADV_DEBUG=startup option to dump more info about
instance creation and device enumeration.
A common question end users have is why the direver is not loading
for them, and this has two common reasons:
1) They did not install the driver.
2) AMDGPU is not used for the card in the kernel.
This adds some info messages so we can easily get a some useful
output from end users.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.
In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This will allow to emit consecutive shader pointers for
reducing the number of emitted SET_SH_REG packets, which
is recommended.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For future work (support for 32-bit GPU pointers).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Pre-Vega HW always interprets the alpha for this format as unsigned,
so we have to implement a fixup to do the sign correctly for signed
formats.
v2: Improve indexing mess.
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106480
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT is set we skip NIR
linking optimisations and only run over the NIR optimisation loop
once similar to the GLSLOptimizeConservatively constant used by
some GL drivers.
We need to run over the opts at least once to avoid errors in LLVM
(e.g. dead vars it can't handle) and also to reduce the time spent
compiling the IR in LLVM.
With this change the Blacksmith Unity demos compilation times
go from 329760 ms -> 299881 ms when using Wine and DXVK.
V2: add bit to radv_pipeline_key
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106246
Previously before fb077b0728, the LOD parameter was being used in place of the
sample index, which would only copy the first sample to all samples in the
destination image. After that multisample image copies wouldn't copy anything
from my observations.
This fixes some copy_and_blit CTS tests.
v3.1: - set lod to 0 for nir_txf_ms (Samuel)
v2: - use GLSL_SAMPLER_DIM_MS instead of 2D (Samuel)
- updated commit description (Samuel)
Fix this properly by copying each sample in a separate radv_CmdDraw and using a
pipeline with the correct rasterizationSamples for the destination image.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Maintaining two different paths is annoying but this gets
rid of the performance regression introduced by the global
BO list.
We might find a better solution in the future, but for now
just keeps two paths.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In order to reduce a performance regression introduced by
4b13fe55a4 ("radv: Keep a global BO list for VkMemory."),
we are going to maintain two different paths.
One when VK_EXT_descriptor_indexing is enabled by the
application because we need to have a global BO list, and
one (the old one) when it's not enabled.
With Talos on Polaris, the global BO list reduces performance
by 10% which is too much for me.
This reverts commit ab6cadd3ec.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This can be enabled with RADV_PERFTEST=dccmsaa.
DCC for MSAA textures is actually not as easy to implement. It
looks like there is some corner cases. I will improve support
incrementally.
Vega support, as well as Polaris improvements, will be added later.
No CTS changes on Polaris using RADV_DEBUG=zerovram and
RADV_PERFTEST=dccmsaa.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With update after bind we can't attach bo's to the command buffer
from the descriptor set anymore, so we have to have a global BO
list.
I am somewhat surprised this works really well even though we have
implicit synchronization in the WSI based on the bo list associations
and with the new behavior every command buffer is associated with
every swapchain image. But I could not find slowdowns in games because
of it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h
v2: fix radv build
Reviewed-by: Brian Paul <brianp@vmware.com>
Pretty straight forward, just pass the divisors through the shader
key and then do a LLVM divide.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
According to Marek, not enabling it on Stoney has a significant
negative performance impact. (And I guess this might impact
performance on Raven as well)
The register settings are pretty much copied from radeonsi. I did
not put this in the pipeline as that would make the pipeline more
dependent on the format which mean we would have to have more
pipelines for the meta shaders.
v2: Don't clear RB+ regs if not enabled as the CLEAR_STATE packet
does already.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
And rename to radv_dcc_enabled() to be consistent.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This unnecessary when the precision bit flag is not set, and this
might hurt performance. The Vulkan explains that not setting
VK_QUERY_CONTROL_PRECISE_BIT might be more efficient on some
implementations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Disabled by default for now, it can be enabled with
RADV_PERFTEST=outoforder.
No CTS regressions on Polaris, and all Vulkan games I tested
look good as well.
Expect small performance improvements for applications where
out-of-order rasterization can be enabled by the driver.
Loosely based on RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes CTS:
dEQP-VK.api.device_init.create_device_queue2_unmatched_flags
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@gmail.com>
Now the "ac/nir" prefix will really be the shared code between
RadeonSI and RADV, that might avoid confusions in the future.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The vulkan API is not ideal as it does not allow us have a
shared limit.
Feral needs 15+6 for one of their games, and I'm not a fan
of overcommitting the limits, so increase the number of
dynamic uniform buffers to 16.
CC: <mesa-stable@lists.freedesktop.org>
CC: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is an optimization which reduces the number of flushes for
small pool buffers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the query pool has been previously resetted using the compute
shader path.
Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query pool")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105292
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is never used.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This implements strict checking for the entrypoint ProcAddr
functions.
- InstanceProcAddr with instance = NULL, only returns the 3 allowed
entrypoints.
- DeviceProcAddr does not return any instance entrypoints.
- InstanceProcAddr does not return non-supported or disabled
instance entrypoints.
- DeviceProcAddr does not return non-supported or disabled device
entrypoints.
- InstanceProcAddr still returns non-supported device entrypoints.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Ported from the radeonsi GL_AMD_pinned_memory implementation.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether
the descriptors will be used by graphics pipelines or compute
pipelines. There is a separate set of bind points for each of
graphics and compute, so binding one does not disturb the other."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104732
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This can lead to a situation where cache flushes could get conditionally
disabled while still clearing the flush_bits, and thus flushes due to
application pipeline barriers may never get executed.
Fixes: a6c2001ace (radv: add support for cmd predication.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Also moved everything in a struct and then return the struct from
the helper function, so it is clear in the caller what part of the
pipeline gets modified.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We don't need the pipeline state struct anymore.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This gives about 2% performance improvement on dota2 for me.
This is mostly a mechanical copy and replacement, but at bind time
we still do:
1) Some stuff that is only based on num_samples changes.
2) Some command buffer state setting.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We need to enable the pos float location 2 mode anytime we have
persample not just when forced by the frag shader.
This fixes:
dEQP-VK.pipeline.multisample.min_sample_shading*
Fixes: 58c97a079 (radv: enable location at sample when persample is forced.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is ported from radeonsi and fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.bit_*
v2: don't call this path for radeonsi, it does it in the epilog.
use the radeonsi code path.
v3: handle NULL pCreateInfo->pMultisampleState properly (Samuel)
v3.1: set ps_iter_samples default to 1 (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: bdcbe7c76 (radv: add sample mask input support)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some of the hw resolve passes need the SPI color format setup
correctly.
This fixes lots of 16-bit and 32-bit format tests in
dEQP-VK.renderpass.suballocation.multisample*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Dave Airlie <airlied@redhat.com>
Passes
dEQP-VK.api.smoke.*
dEQP-VK.wsi.android.*
with android-cts-7.1_r12 .
Unlike the initial anv implementation this does
use syncobjs instead of waiting on the CPU.
This is missing meson build coverage for now.
One possible todo is that linux 4.15 now has a
sycall that allows us to export amdgpu fence to
a sync_file, which allows us not to force all
fences and semaphores to use syncobjs. However,
I had trouble with my kernel crashing regularly
with NULL pointers, and I'm not sure how beneficial
it is in the first place given that intel uses
syncobjs for all fences if available.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is not hooked up to any messages yet, but useful for e.g.
renderdoc if you add some messages during development.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If the number of instances hasn't changed and we've already
emitted it, don't emit it again.
If the vertex shader is the same and the first_instance, vertex_offset
haven't changed don't emit them again.
This increases the fps in GL_vs_VK -t 1 -m -api vk from around 40
to around 60 here, it may not impact anything else.
Dieter also reported smoketest going from 1060->1200 fps.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tested with a modified deferred demo and no regressions in a 1.0.2
mustpass run.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The EXT values are really large, e.g.
VK_DYNAMIC_STATE_DISCARD_RECTANGLE_EXT = 1000099000, so 1 << value
is not going to fit into a 32-bit mask.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Overall it does not really help or hurt. The deferred demo gets 1%
improvement and some games a 3% decrease, so I don't think this
should be enabled by default.
But with the code upstream it is easier to experiment with it.
v2: Remove initializing the registers from si_emit_config.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Apps can use this for render feedback loops, where things are
defined if they render each pixel only once. However, DCC fails
here, as the level of coherence is a block not a pixel, so disable it.
This is also going to help implementing other stuff.
Even if we optimize this later to only happen if there actually is
a loop (if possible at all ...), then the machinery is still useful
to exclude images accessible by the SDMA queue when that is implemented.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
For fast clear eliminate and decompressions, we always use the most compressed
format.
For clears, the code already creates a renderpass on demand with the exact same
layout as specified.
Otherwise we start distinguishing between GENERAL and TRANSFER_DST_OPTIMAL.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
We do an in place copy where we read compressed and write decompressed.
By doing this in sizes that cover entire DCC blocks and waiting for all
reads in the block before starting to write we avoid corruption.
In the end we clear the DCC metadata to 0xffffffff.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes the layout issue for the blit path as well.
This fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint*
v2: use compatible render passes.
v2.1: use enum
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we are doing a general->general transfer with HIZ enabled,
we want to hit the tile surface disable bits in radv_emit_fb_ds_state,
however we never get the current layout to know we are in general
and meta hardcoded the transfer layout which is always tile enabled.
This fixes:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint.optimal_general
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint.general_general
v2: refactor some shared helpers for blit patches
v3: we only need multiple render passes as they should be compatible.
v3.1: use enum (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This add support for a 3D image reading path to the blit 2d paths,
like I did for the clear paths.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On GFX9 we must access 3D textures with 3D samplers AFAICS.
This fixes:
dEQP-VK.api.image_clearing.core.clear_color_image.3d.single_layer
on GFX9 for me.
v1.1: fix tex->sampler_dim to dim
v2: send layer in from outside
v3: don't regress on pre-gfx9
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
They are dummy objects but the spec requires layout to not be
NULL, this just makes sure we are creating valid pipeline layout
objects. This will allow us to remove some useless checks.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
dota2 binds a ton of index buffers but the type is always 16-bit.
Note that we have to invalidate the type when switching from
indexed draws to normal draws.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
RADV_CMD_BUFFER_STATUS_INVALID is not used for now, but I think
it makes sense to declare it. Could be used later with better
command buffer error handling.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Allows apps to determine the LLVM version so that they can decide
whether or not to enable workarounds for LLVM issues.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Just check if image has scanout flag set
v2 (Jason Ekstrand):
- Rebase
- Also drop the now unused radv_mem_flag_bits enum
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This allows to update them with only one memcpy().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
I don't think we will need a 64-bit unsigned integer for the
dirty flags in the future, and there is still 20 bits left.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
VkDeviceQueueCreateInfo::queueFamilyIndex is an unsigned 32-bit
integer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Just after the vertex shader.
This seems to give a minor boost for, at least, Serious Sam
Fusion 2017 and Dawn of War 3. I don't see any real impacts
with The Talos Principle.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This moves some calculations of register values into the pipeline
construction, it saves looking at outinfo in the cmd buffer emit.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There's no point recalculating these the whole time on descriptor
emission, just store them at pipeline creation.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of storing all the pointers and zeroing them all out,
just store a valid bitmask in the state. This also moves
the CmdBindPipeline path down the cpu usage path for the
multithreading demo as it no longer has to traverse MAX_SETS
to find the active descriptor sets.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This isn't required to be cleared, since buffers are only linked
by vertex elements, so if elements are clear then no buffers
should be referenced.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just removes a hole in the cmd_state and packs some bools
together.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The vram_list linked list resulted in lots of pointer chasing.
Replacing this with an array instead improves descriptor set
allocation CPU usage by 3x at least (when also considering the free),
because it had to iterate through 300-400 sets on average.
Not a huge improvement as the pre-improvement CPU usage was only
about 2.3% in the busiest thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
It confuses CTS. This pregenerates the heap info into the
physical device, so we can use it for translating contiguous
indices into our "standard" ones.
This also makes the WSI a bit smarter in case the first preferred
heap does not exist.
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
This should make sure we don't treat exports buffers as local
bos.
Fixes: a639d40f13 (radv: add support for local bos. (v3))
Tested-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This allows an app to query shader statistics and get a disassembly of
a shader. RenderDoc git has support for it, so this allows you to view
shader disassembly from a capture.
When this extension is enabled on a device (or when tracing), we now
disable pipeline caching, since we don't get the shader debug info when
we retrieve cached shaders.
v2: Improvements to resource usage reporting
v3: Disassembly string must be null terminated (string_buffer's length
does not include the terminator)
v4: Fixed LDS reporting. (Bas)
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only needed when the CS path is used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
radeonsi only emits these when dfsm is enabled, so for now
just hinge them on a flag we never set.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This allows us to pass extra parameters to the memory allocation
operation that are not defined in the vulkan spec. This is useful for
internal usage.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>