Commit Graph

484 Commits

Author SHA1 Message Date
Bas Nieuwenhuizen 3c2e8267d0 radv: Add support for driconf.
This includes 0 options.

The cache parsing is located at a position where we can easily add
config filtering by VkApplicationInfo.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-23 23:49:39 +00:00
Bas Nieuwenhuizen 8d2654a419 radv: Support VK_EXT_inline_uniform_block.
Basically just reserve the memory in the descriptor sets.

On the shader side we construct a buffer descriptor, since
AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken.

This fully supports update after bind and variable descriptor set
sizes. However, the limits are somewhat arbitrary and are mostly
about finding a reasonable division of a 2 GiB max memory size over
the set.

v2: - rebased on top of master (Samuel)
    - remove the loading resources rework (Samuel)
    - only load UBO descriptors if it's a pointer (Samuel)
    - use LLVMBuildPtrToInt to avoid IR failures (Samuel)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
2019-04-19 09:21:47 +02:00
Bas Nieuwenhuizen 5f5ac19f13 radv: Implement VK_EXT_pipeline_creation_feedback.
Does what it says on the tin.

The per stage time is only an approximation due to linking and
the Vega merged stages.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-03-20 21:19:46 +00:00
Samuel Pitoiset a66b186beb radv: use typed buffer loads for vertex input fetches
This drastically reduces the number of SGPRs because the driver
now uses descriptors per vertex binding, instead of per vertex
attribute format.

29077 shaders in 15096 tests
Totals:
SGPRS: 1354285 -> 1282109 (-5.33 %)
VGPRS: 909896 -> 908800 (-0.12 %)
Spilled SGPRs: 24840 -> 24811 (-0.12 %)
Code Size: 49221144 -> 48986628 (-0.48 %) bytes
Max Waves: 243930 -> 244229 (0.12 %)

Totals from affected shaders:
SGPRS: 390648 -> 318472 (-18.48 %)
VGPRS: 288432 -> 287336 (-0.38 %)
Spilled SGPRs: 94 -> 65 (-30.85 %)
Code Size: 11548412 -> 11313896 (-2.03 %) bytes
Max Waves: 86460 -> 86759 (0.35 %)

This gives a really tiny boost.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-13 13:31:11 +01:00
Samuel Pitoiset 0b9a06a1a0 radv: store more vertex attribute infos as pipeline keys
They are required for using typed buffer loads.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-13 13:31:08 +01:00
Bas Nieuwenhuizen 7631feaa00 radv: Sync ETC2 whitelisted devices.
Fixes: 4bb6c49375 "radv: Allow ETC2 on RAVEN and VEGA10 instead of all GFX9."
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-02-20 02:55:41 +01:00
Samuel Pitoiset 210aec3612 radv: store vertex attribute formats as pipeline keys
The formats will be used for reducing the number of loaded channels.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-14 09:10:09 +01:00
Samuel Pitoiset 1b8983c25b radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8
This fixes a critical issue.

Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109575
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-12 17:39:30 +01:00
Samuel Pitoiset 5806d99984 radv: gather more info about push constants
This is needed in order to inline some push constants when possible.
This also adds a new helper for initializing the pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-12 17:25:34 +01:00
Samuel Pitoiset 6430616e77 radv: track if subpasses have color attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset 5699ac0078 radv: determine the last subpass id for every attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:59 +01:00
Samuel Pitoiset a20c2e38d8 radv: store the list of attachments for every subpass
This reworks how the depth stencil attachment is used for
simplicity. This also introduces radv_render_pass_compile()
helper that will be used for further optimizations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:54 +01:00
Samuel Pitoiset a7c7d811f1 radv: move subpass image transitions to radv_cmd_buffer_begin_subpass()
Instead of doing them in radv_cmd_buffer_set_subpass().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:52 +01:00
Samuel Pitoiset 545552c9b9 radv: remove unused radv_render_pass_attachment::view_mask
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:42 +01:00
Timothy Arceri 9b9ccee4d6 radv: take LDS into account for compute shader occupancy stats
Ported from d205faeb6c.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-01 22:25:30 +11:00
Samuel Pitoiset 5f0b17d581 radv: compute the GFX9 fence VA at allocation time
Instead of doing every time we emit cache flushes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-23 11:31:12 +01:00
Samuel Pitoiset bd098884f1 radv: remove old_fence parameter from si_cs_emit_write_event_eop()
This parameter is actually useless as the immediate value
can always be zero.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-23 11:31:07 +01:00
Rhys Perry e4c6423c5e radv: avoid context rolls when binding graphics pipelines
It's common in some applications to bind a new graphics pipeline without
ending up changing any context registers.

This has a pipline have two command buffers: one for setting context
registers and one for everything else. The context register command buffer
is only emitted if it differs from the previous pipeline's.

v2: ensure late scissor emission is done when radv_emit_rbplus_state() is
    called
v2: make use of cmd_buffer->state.workaround_scissor_bug
v3: rename "workaround_scissor_bug" to
    "context_roll_without_scissor_emitted"

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-21 14:37:53 +00:00
Rhys Perry 5564a797f2 radv: add missed situations for scissor bug workaround
v2: rename "workaround_scissor_bug" to
    "context_roll_without_scissor_emitted"

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-21 14:37:53 +00:00
Samuel Pitoiset d58b11e709 radv: get rid of bunch of KHR suffixes
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-01-09 12:26:48 +01:00
Bas Nieuwenhuizen 656c1c488c radv: Remove device path.
unused and gcc complains about strncpy. (from what I can see because
strncpy does not leave a 0 byte on truncate. That said we don't use
it so this does not fix a real bug).

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-01-07 23:15:14 +01:00
Samuel Pitoiset 6b976024a8 radv: add support for FMASK expand
Original patch by Dave Airlie.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-20 18:01:17 +01:00
Samuel Pitoiset fa16da53d8 radv: initialize FMASK for images in fully expanded mode
The value depends on the number of samples.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-20 18:01:15 +01:00
Samuel Pitoiset 3a5adc2879 radv: add a predicate for reflecting DCC decompression state
It's somehow similar to the FCE predicate.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-13 09:21:10 +01:00
Samuel Pitoiset 3fbdcd942f amd: remove support for LLVM 6.0
User are encouraged to switch to LLVM 7.0 released in September 2018.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-06 14:02:56 +01:00
Samuel Pitoiset 824cfc1ee5 radv: rework the TC-compat HTILE hardware bug with COND_EXEC
After investigating on this, it appears that COND_WRITE doesn't
work correctly in some situations. I don't know exactly why does
it fail to update DB_Z_INFO.ZRANGE_PRECISION, but as AMDVLK
also uses COND_EXEC I think there is a reason.

Now the driver stores a new metadata value in order to reflect
the last fast depth clear state. If a TC-compat HTILE is fast cleared
with 0.0f, we have to update ZRANGE_PRECISION to 0 in order to
work around that hardware bug.

This fixes rendering issues with The Forest and DXVK and doesn't
seem to introduce any regressions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108914
Fixes: 68dead112e ("radv: update the ZRANGE_PRECISION value for the TC-compat bug")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-05 09:26:31 +01:00
Samuel Pitoiset 724107553c radv: implement fast HTILE clears for depth or stencil only on GFX9
This allows to fast clear the depth part (or the stencil part)
of a depth+stencil surface when HTILE is enabled. I didn't test
on GFX8, so it's disabled currently.

This gives a very nice boost, for example when clearing the depth
aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).

BEFORE: 235 us
AFTER: 13 us

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 16:32:18 +01:00
Samuel Pitoiset 483a28bfd4 radv: tidy up radv_set_dcc_need_cmask_elim_pred()
This is just a small cleanup.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-19 14:05:33 +01:00
Samuel Pitoiset c571ca7a08 radv: replace si_emit_wait_fence() with radv_cp_wait_mem()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-05 09:48:50 +01:00
Samuel Pitoiset b1b2dd06a7 radv: add missing TFB queries support to CmdCopyQueryPoolsResults()
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Fixes: b4eb029062 ("radv: implement VK_EXT_transform_feedback")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-05 09:48:43 +01:00
Samuel Pitoiset b4eb029062 radv: implement VK_EXT_transform_feedback
This implementation should work and potential bugs can be
fixed during the release candidates window anyway.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-10-29 17:10:58 +01:00
Samuel Pitoiset f4fa8de794 radv: gather stream output info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-10-29 17:09:08 +01:00
Samuel Pitoiset 79bbdf8e45 radv: implement image to image operations for R32G32B32
This should address the remaining failures in Batman Arkhman City.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-26 10:50:08 +02:00
Samuel Pitoiset 593996bc02 radv: implement buffer to image operations for R32G32B32
This should fix rendering issues with Batman Arkham City.
We will probably need to implement itob and itoi at some
point, but currently nothing hits these paths.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-16 09:22:38 +02:00
Bas Nieuwenhuizen 6ed0fd24d4 radv: Implement VK_EXT_pci_bus_info.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-10-15 12:27:49 +02:00
Samuel Pitoiset 229803b66a radv: implement clear operations for R32G32B32
This fixes crashes for some CTS:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.linear_*_*
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.*_linear_*

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108113
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-11 14:49:16 +02:00
Samuel Pitoiset 621e70dd40 radv: adjust the CmdUpdateBuffer threshold for optimal performance
According to my benchmark results, it appears that we should
reduce the threshold to 1024.

BEFORE:
1 KB: 68.656000 ms
2 KB: 118.368000 ms

AFTER:
1 KB: 31.760000 ms
2 KB: 29.840000 ms

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-28 09:08:44 +02:00
Samuel Pitoiset 3871dd7a92 radv: allow to force anisotropy via RADV_TEX_ANISO
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-18 13:27:58 +02:00
Samuel Pitoiset c79aad30ae radv: emit the initial config only once in the preambles
It shouldn't be needed to emit the initial graphics or compute
state when beginning a new command buffer. Emitting them in
the preamble should be enough and this will reduce IB sizes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Bas Nieuwenhuizen fbcd167314 radv: Add on-demand compilation of built-in shaders.
In environments where we cannot cache, e.g. Android (no homedir),
ChromeOS (readonly rootfs) or sandboxes (cannot open cache), the
startup cost of creating a device in radv is rather high, due
to compiling all possible built-in pipelines up front. This meant
depending on the CPU a 1-4 sec cost of creating a Device.

For CTS this cost is unacceptable, and likely for starting random
apps too.

So if there is no cache, with this patch radv will compile shaders
on demand. Once there is a cache from the first run, even if
incomplete, the driver knows that it can likely write the cache
and precompiles everything.

Note that I did not switch the buffer and itob/btoi compute pipelines
to on-demand, since you cannot really do anything in Vulkan without
them and there are only a few.

This reduces the CTS runtime for the no caches scenario on my
threadripper from 32 minutes to 8 minutes.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-08-14 10:26:24 +02:00
Bas Nieuwenhuizen 806a792b43 radv: Make fs key exemplars ordered to be a reverse fs_key lookup.
While at it, share the exemplars and account for a non-occurring
fs key.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-08-14 10:26:06 +02:00
Samuel Pitoiset 0a8127bbfb radv: make use of radv_subpass_barrier() when resolving subpasses
The goal is to use radv_barrier()/radv_subpass_barrier() as
much as possible for further optimizations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-20 10:17:11 +02:00
Samuel Pitoiset e45ba51ea4 radv: add support for VK_EXT_conditional_rendering
Inherited commands buffers are not supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-18 13:44:09 +02:00
Samuel Pitoiset 946cf3f39f radv: add support for non-inverted conditional rendering
By default, our internal rendering commands are discarded
only if the predicate is non-zero (ie. DRAW_VISIBLE). But
VK_EXT_conditional_rendering also allows to discard commands
when the predicate is zero, which means we have to use a
different flag.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-18 13:44:06 +02:00
Samuel Pitoiset 5b32926f7e radv: remove unnecessary verification code around ring_offsets_idx
I don't want to waste CPU cycles for nothing.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-12 11:08:42 +02:00
Samuel Pitoiset 1f616a840e radv: emit a dummy ZPASS_DONE to prevent GPU hangs on GFX9
A ZPASS_DONE or PIXEL_STAT_DUMP_EVENT (of the DB occlusion
counters) must immediately precede every timestamp event to
prevent a GPU hang on GFX9.

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-12 10:22:36 +02:00
Samuel Pitoiset fe28978f2a radv: introduce radv_subpass_attachment data structure
Needed for VK_KHR_create_renderpass2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-12 10:20:06 +02:00
Samuel Pitoiset 4a67ce886a radv: make sure to wait for CP DMA when needed
This might fix some synchronization issues. I don't know if
that will affect performance but it's required for correctness.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-11 12:11:56 +02:00
Samuel Pitoiset f2a310849e radv: only flush CB meta in pipeline image barriers when needed
If the given image doesn't enable CMASK, FMASK or DCC that's
useless to flush CB metadata.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-05 17:20:16 +02:00
Dave Airlie 7398913a62 ac/radv: move llvm compiler info to struct and init in one place
This ports radv to the shared code, however due to a bug in LLVM
version prior to 7, radv cannot add target info at this stage,
as it would leak one for every shader compile, however I'd prefer
to keep this llvm damage in the shared code, since it isn't the
driver at fault here. We just add a flag to denote if the driver
can support leaking the target info or not, and the common code
does the right thing depending on the llvm version.

 Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 10:29:16 +10:00
Dave Airlie e1387eaf12 radv: create/destroy passmgr at the higher level.
This is prep work for moving this to a per-thread struct

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:31:05 +10:00
Samuel Pitoiset 7a57c82767 radv: use separate bind points for the dynamic buffers
The Vulkan spec says:

   "pipelineBindPoint is a VkPipelineBindPoint indicating whether
    the descriptors will be used by graphics pipelines or compute
    pipelines. There is a separate set of bind points for each of
    graphics and compute, so binding one does not disturb the other."

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-27 09:48:31 +02:00
Samuel Pitoiset 9c09e7d66e radv: remove unused 'predicated' parameter from some functions
It's always false.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-27 09:48:15 +02:00
Samuel Pitoiset fa42fa1a60 radv: emit PIPELINESTAT_{START,STOP} events for pipeline stats queries
Ported from RadeonSI.
This appears to fix some random fails with:
dEQP-VK.query_pool.statistics_query.*

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-26 18:23:16 +02:00
Keith Packard 1df586be12 radv: add VK_EXT_display_control to radv driver [v5]
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.

v2:
	Rework fence integration into the driver so that waiting for
	any of a mixture of fence types (wsi, driver or syncobjs)
	causes the driver to poll, while a list of just syncobjs or
	just driver fences will block. When we get syncobjs for wsi
	fences, we'll adapt to use them.

v3:	Adopt Jason Ekstrand's coding conventions

	Declare variables at first use, eliminate extra whitespace between
	types and names. Wrap lines to 80 columns.

	Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

v4:	Adapt to WSI fence API change. It now returns VkResult and
	no longer has an option for relative timeouts.

v5:	wsi_register_display_event and wsi_register_device_event now
	use the default allocator when NULL is provided, so remove the
	computation of 'alloc' here.

Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-23 07:59:00 -07:00
Samuel Pitoiset 65b3fed037 radv: always initialize the clear depth/stencil values to 0
Similar to the clear color values.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-20 13:21:42 +02:00
Samuel Pitoiset 204cf5714a radv: always initialize the clear color values to 0
Having random data in there is probably not the best.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-20 13:21:42 +02:00
Samuel Pitoiset 20170865db radv: don't store the number of samples as log2
Needed for the following patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-20 13:21:42 +02:00
Keith Packard 451b58a51e radv: Add KHR_display extension to radv [v5]
This adds support for the KHR_display extension to the radv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.

v2:
	* Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to
          vulkan_wsi_args

	Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>

v3:
	Adapt to new wsi_device_init API (added display_fd)

v4:
	Adopt Jason Ekstrand's coding conventions

	Declare variables at first use, eliminate extra whitespace
	between types and names. Wrap lines to 80 columns.

	Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

v5:
	Add vkCreateDisplayModeKHR. This doesn't actually create
	new modes, it only looks to see if the requested parameters
	matches an existing mode and returns that.

    	Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-19 14:17:46 -07:00
Keith Packard da997ebec9 vulkan: Add KHR_display extension using DRM [v10]
This adds support for the KHR_display extension support to the vulkan
WSI layer. Driver support will be added separately.

v2:
	* fix double ;; in wsi_common_display.c

	* Move mode list from wsi_display to wsi_display_connector

	* Fix scope for wsi_display_mode andwsi_display_connector
          allocs

	* Switch all allocations to vk_zalloc instead of vk_alloc.

	* Fix DRM failure in
          wsi_display_get_physical_device_display_properties

	  When DRM fails, or when we don't have a master fd
	  (presumably due to application errors), just return 0
	  properties from this function, which is at least a valid
	  response.

	* Use vk_outarray for all property queries

	  This is a bit less error-prone than open-coding the same
	  stuff.

	* Remove VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR from surface caps

	  Until we have multi-plane support, we shouldn't pretend to
	  have any multi-plane semantics, even if undefined.

	Suggested-by: Jason Ekstrand <jason@jlekstrand.net>

	* Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to
          vulkan_wsi_args

	Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>

v3:
	Add separate 'display_fd' and 'render_fd' arguments to
	wsi_device_init API. This allows drivers to use different FDs
	for the different aspects of the device.

	Use largest mode as display size when no preferred mode.

	If the display doesn't provide a preferred mode, we'll assume
	that the largest supported mode is the "physical size" of the
	device and report that.

v4:
	Make wsi_image_state enumeration values uppercase.
	Follow more common mesa conventions.

	Remove 'render_fd' from wsi_device_init API.  The
	wsi_common_display code doesn't use this fd at all, so stop
	passing it in. This avoids any potential confusion over which
	fd to use when creating display-relative object handles.

	Remove call to wsi_create_prime_image which would never have
	been reached as the necessary condition (use_prime_blit) is
	never set.

	whitespace cleanups in wsi_common_display.c

	Suggested-by: Jason Ekstrand <jason@jlekstrand.net>

	Add depth/bpp info to available surface formats.  Instead of
	hard-coding depth 24 bpp 32 in the drmModeAddFB call, use the
	requested format to find suitable values.

	Destroy kernel buffers and FBs when swapchain is destroyed. We
	were leaking both of these kernel objects across swapchain
	destruction.

	Note that wsi_display_wait_for_event waits for anything to
	happen.  wsi_display_wait_for_event is simply a yield so that
	the caller can then check to see if the desired state change
	has occurred.

	Record swapchain failures in chain for later return. If some
	asynchronous swapchain activity fails, we need to tell the
	application eventually. Record the failure in the swapchain
	and report it at the next acquire_next_image or queue_present
	call.

	Fix error returns from wsi_display_setup_connector.  If a
	malloc failed, then the result should be
	VK_ERROR_OUT_OF_HOST_MEMORY. Otherwise, the associated ioctl
	failed and we're either VT switched away, or our lease has
	been revoked, in which case we should return
	VK_ERROR_OUT_OF_DATE_KHR.

	Make sure both sides of if/else brace use matches

	Note that we assume drmModeSetCrtc is synchronous. Add a
	comment explaining why we can idle any previous displayed
	image as soon as the mode set returns.

	Note that EACCES from drmModePageFlip means VT inactive.  When
	vt switched away drmModePageFlip returns EACCES. Poll once a
	second waiting until we get some other return value back.

	Clean up after alloc failure in
	wsi_display_surface_create_swapchain. Destroy any created
	images, free the swapchain.

	Remove physical_device from wsi_display_init_wsi. We never
	need this value, so remove it from the API and from the
	internal wsi_display structure.

	Use drmModeAddFB2 in wsi_display_image_init.  This takes a drm
	format instead of depth/bpp, which provides more control over
	the format of the data.

v5:
	Set the 'currentStackIndex' member of the
	VkDisplayPlanePropertiesKHR record to zero, instead of
	indexing across all displays. This value is the stack depth of
	the plane within an individual display, and as the current
	code supports only a single plane per display, should be set
	to zero for all elements

	Discovered-by: David Mao <David.Mao@amd.com>

v6:
	Remove 'platform_display' bits from the build and use the
	existing 'platform_drm' instead.

v7:
	Ensure VK_ICD_WSI_PLATFORM_MAX is large enough by
	setting to VK_ICD_WSI_PLATFORM_DISPLAY + 1

v8:
	Simplify wsi_device_init failure from wsi_display_init_wsi
	by using the same pattern as the other wsi layers.

    Adopt Jason Ekstrand's white space and variable declaration
	suggestions. Declare variables at first use, eliminate extra
	whitespace between types and names, add list iterator helpers,
	switch to lower-case list_ macros.

    Respond to Jason's April 8 review:

	* Create a function to convert relative to absolute timeouts
          to catch overflow issues in one place

	* use VK_NULL_HANDLE to clear prop->currentDisplay

	* Get rid of available_present_modes array.

	* return OUT_OF_DATE_KHR when display_queue_next called after
	  display has been released.

	* Make errors from mode setting fatal in display_queue_next

	* Remove duplicate pthread_mutex_init call

	* Add wsi_init_pthread_cond_monotonic helper function to
	  isolate pthread error handling from wsi_display_init_wsi

	Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

v9:
	Fix vscan handling by using MAX2(vscan, 1) everywhere. Vscan
	can be zero anywhere, which is treated the same as 1.

	Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

v10:
	Respond to Vulkan CTS failures.

	1. Initialize planeReorderPossible in display_properties code

	2. Only report connected displays in
	   get_display_plane_supported_displays

	3. Return VK_ERROR_OUT_OF_HOST_MEMORY when pthread cond
	   initialization fails.

	Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>

	4. Add vkCreateDisplayModeKHR. This doesn't actually create
	   new modes, it only looks to see if the requested parameters
	   matches an existing mode and returns that.

	Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2018-06-19 14:17:46 -07:00
Marek Olšák 6703fec58c amd,radeonsi: rename radeon_winsys_cs -> radeon_cmdbuf
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-19 13:08:50 -04:00
Samuel Pitoiset fa8bc821a8 radv: clean up radv_{set,load}_depth_clear_regs() helpers
And replace _regs by _metadata because it makes more sense.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:04 +02:00
Samuel Pitoiset be794fa26b radv: clean up radv_{set,load}_color_clear_regs() helpers
And replace _regs by _metadata because it makes more sense.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:53:58 +02:00
Dave Airlie 600d34c822 radv: remove multisample bit from shader key.
This wasn't being used anywhere inside the shader from what I can see.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 09:33:20 +10:00
Alex Smith 7ca0167ae9 radv: Consolidate GFX9 merged shader lookup logic
This was being handled in a few different places, consolidate it into a
single radv_get_shader() function.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-01 08:53:31 +01:00
Bas Nieuwenhuizen b9fb2c266a radv: Add startup debug option.
This adds a RADV_DEBUG=startup option to dump more info about
instance creation and device enumeration.

A common question end users have is why the direver is not loading
for them, and this has two common reasons:
1) They did not install the driver.
2) AMDGPU is not used for the card in the kernel.

This adds some info messages so we can easily get a some useful
output from end users.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Bas Nieuwenhuizen 38933c1151 radv: Add option to print errors even in optimized builds.
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.

In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Bas Nieuwenhuizen 729f7373de radv: Make the sem_info allocate/free functions static.
They are only used in 1 file.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Samuel Pitoiset 21baf33a94 radv: allow radv_emit_shader_pointer_head() to emit more pointers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:16 +02:00
Samuel Pitoiset 288fe7ec71 radv: split radv_emit_shader_pointer()
This will allow to emit consecutive shader pointers for
reducing the number of emitted SET_SH_REG packets, which
is recommended.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:13 +02:00
Samuel Pitoiset 36a4d6d081 radv: add support for 32-bit pointers in user data SGPRs
We still use 64-bit GPU pointers for all ring buffers because
llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit
GPU pointers for now. This can be improved later anyways.

Vega10:
Totals from affected shaders:
SGPRS: 1008722 -> 1026710 (1.78 %)
VGPRS: 706580 -> 707136 (0.08 %)
Spilled SGPRs: 22555 -> 22209 (-1.53 %)
Spilled VGPRs: 75 -> 75 (0.00 %)
Code Size: 34819208 -> 35202140 (1.10 %) bytes
Max Waves: 175423 -> 175086 (-0.19 %)

Polaris10:
Totals from affected shaders:
SGPRS: 1029849 -> 1036517 (0.65 %)
VGPRS: 709984 -> 708872 (-0.16 %)
Spilled SGPRs: 22672 -> 22309 (-1.60 %)
Spilled VGPRs: 82 -> 66 (-19.51 %)
Scratch size: 76 -> 60 (-21.05 %) dwords per thread
Code Size: 34915336 -> 35309752 (1.13 %) bytes
Max Waves: 151221 -> 151677 (0.30 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:22 +02:00
Samuel Pitoiset fcba3934fc radv: add radv_emit_shader_pointer() helper
For future work (support for 32-bit GPU pointers).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 21:28:59 +02:00
Samuel Pitoiset 1e86eaf7d8 radv: remove radv_device::llvm_supports_spill
It's always true.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:21 +02:00
Bas Nieuwenhuizen 3d4d388e39 radv: Fix up 2_10_10_10 alpha sign.
Pre-Vega HW always interprets the alpha for this format as unsigned,
so we have to implement a fixup to do the sign correctly for signed
formats.

v2: Improve indexing mess.

CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106480
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-14 18:58:20 +02:00
Timothy Arceri ce188813bf radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT
When VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT is set we skip NIR
linking optimisations and only run over the NIR optimisation loop
once similar to the GLSLOptimizeConservatively constant used by
some GL drivers.

We need to run over the opts at least once to avoid errors in LLVM
(e.g. dead vars it can't handle) and also to reduce the time spent
compiling the IR in LLVM.

With this change the Blacksmith Unity demos compilation times
go from 329760 ms -> 299881 ms when using Wine and DXVK.

V2: add bit to radv_pipeline_key

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106246
2018-05-13 09:58:33 +10:00
Matthew Nicholls 97d57ef917 radv: fix multisample image copies
Previously before fb077b0728, the LOD parameter was being used in place of the
sample index, which would only copy the first sample to all samples in the
destination image. After that multisample image copies wouldn't copy anything
from my observations.

This fixes some copy_and_blit CTS tests.

v3.1: - set lod to 0 for nir_txf_ms (Samuel)
v2: - use GLSL_SAMPLER_DIM_MS instead of 2D (Samuel)
    - updated commit description (Samuel)

Fix this properly by copying each sample in a separate radv_CmdDraw and using a
pipeline with the correct rasterizationSamples for the destination image.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-02 19:32:00 +02:00
Samuel Pitoiset 5c1233ed62 radv: use a global BO list only for VK_EXT_descriptor_indexing
Maintaining two different paths is annoying but this gets
rid of the performance regression introduced by the global
BO list.

We might find a better solution in the future, but for now
just keeps two paths.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 16:18:18 +02:00
Samuel Pitoiset 7bd5367546 Revert "radv: Don't store buffer references in the descriptor set."
In order to reduce a performance regression introduced by
4b13fe55a4 ("radv: Keep a global BO list for VkMemory."),
we are going to maintain two different paths.

One when VK_EXT_descriptor_indexing is enabled by the
application because we need to have a global BO list, and
one (the old one) when it's not enabled.

With Talos on Polaris, the global BO list reduces performance
by 10% which is too much for me.

This reverts commit ab6cadd3ec.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 16:18:13 +02:00
Samuel Pitoiset 2f63b3dd09 radv: enable DCC for MSAA 2x textures on VI under an option
This can be enabled with RADV_PERFTEST=dccmsaa.

DCC for MSAA textures is actually not as easy to implement. It
looks like there is some corner cases. I will improve support
incrementally.

Vega support, as well as Polaris improvements, will be added later.

No CTS changes on Polaris using RADV_DEBUG=zerovram and
RADV_PERFTEST=dccmsaa.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-19 09:10:55 +02:00
Bas Nieuwenhuizen ab6cadd3ec radv: Don't store buffer references in the descriptor set.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Bas Nieuwenhuizen 4b13fe55a4 radv: Keep a global BO list for VkMemory.
With update after bind we can't attach bo's to the command buffer
from the descriptor set anymore, so we have to have a global BO
list.

I am somewhat surprised this works really well even though we have
implicit synchronization in the WSI based on the bo list associations
and with the new behavior every command buffer is associated with
every swapchain image. But I could not find slowdowns in games because
of it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Samuel Pitoiset fde7b90ecf radv: make radv_initialise_cmask() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:51 +02:00
Marek Olšák 43d66c8c2d mesa: include mtypes.h less
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h

v2: fix radv build

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-04-12 19:31:30 -04:00
Bas Nieuwenhuizen 6ff98dbf7c radv: Implement VK_EXT_vertex_attribute_divisor.
Pretty straight forward, just pass the divisors through the shader
key and then do a LLVM divide.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-12 22:57:23 +02:00
Bas Nieuwenhuizen ed94638156 radv: Enable RB+ where possible.
According to Marek, not enabling it on Stoney has a significant
negative performance impact. (And I guess this might impact
performance on Raven as well)

The register settings are pretty much copied from radeonsi. I did
not put this in the pipeline as that would make the pipeline more
dependent on the format which mean we would have to have more
pipelines for the meta shaders.

v2: Don't clear RB+ regs if not enabled as the CLEAR_STATE packet
    does already.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-11 01:19:10 +02:00
Samuel Pitoiset b0f8ad189c radv: add radv_image_is_tc_compat_htile() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 11:21:26 +02:00
Samuel Pitoiset ed41e776d0 radv: clean up radv_vi_dcc_enabled()
And rename to radv_dcc_enabled() to be consistent.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 11:21:14 +02:00
Samuel Pitoiset e213f19907 radv: clean up radv_htile_enabled()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 11:21:12 +02:00
Samuel Pitoiset 0fc9113ac5 radv: add radv_image_has_{cmask,fmask,dcc,htile}() helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 11:21:10 +02:00
Samuel Pitoiset 7fe586f6fb radv: only enable PERFECT_ZPASS_COUNTS for precision occlusion queries
This unnecessary when the precision bit flag is not set, and this
might hurt performance. The Vulkan explains that not setting
VK_QUERY_CONTROL_PRECISE_BIT might be more efficient on some
implementations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 09:07:34 +02:00
Samuel Pitoiset a8a696a38f radv: use a mask for VBOs and shaders prefetching
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-05 10:03:42 +02:00
Samuel Pitoiset 922cd38172 radv: implement out-of-order rasterization when it's safe on VI+
Disabled by default for now, it can be enabled with
RADV_PERFTEST=outoforder.

No CTS regressions on Polaris, and all Vulkan games I tested
look good as well.

Expect small performance improvements for applications where
out-of-order rasterization can be enabled by the driver.

Loosely based on RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-04 13:32:00 +02:00
Samuel Pitoiset 2a329f4ada radv: set SAMPLE_RATE to the number of samples of the current fb
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-30 17:32:15 +02:00
Samuel Pitoiset 52fba3f45d radv: remove unused radv_pipeline::needs_data_cache variable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-22 14:30:37 +01:00
Samuel Pitoiset d07edf5fdf radv: add dump_shader to the NIR compiler options
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-15 17:20:00 +01:00
Samuel Pitoiset 38f34117dd radv: fix vkGetDeviceQueue2() when create flags don't match
This fixes CTS:
dEQP-VK.api.device_init.create_device_queue2_unmatched_flags

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@gmail.com>
2018-03-14 09:53:42 +01:00
Samuel Pitoiset fbe694562b ac/nir: move ac_nir_compiler_options and friends to radv folder
Also replace ac_ by radv_.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 16:54:23 +01:00
Samuel Pitoiset 237229430f ac: move ac_shader_info to radv folder
This is RADV specific code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 16:54:21 +01:00
Samuel Pitoiset 2cfba40eea ac/nir: move ac_shader_variant_info and friends to radv folder
Also replace ac_ by radv_.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 16:54:16 +01:00
Samuel Pitoiset b2653007b9 ac/nir: move all RADV related code to radv_nir_to_llvm.c
Now the "ac/nir" prefix will really be the shared code between
RadeonSI and RADV, that might avoid confusions in the future.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Bas Nieuwenhuizen 997306c031 radv: Increase the number of dynamic uniform buffers.
The vulkan API is not ideal as it does not allow us have a
shared limit.

Feral needs 15+6 for one of their games, and I'm not a fan
of overcommitting the limits, so increase the number of
dynamic uniform buffers to 16.

CC: <mesa-stable@lists.freedesktop.org>
CC: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-03-12 09:46:22 +01:00
Samuel Pitoiset c27f5419f6 radv: only emit cache flushes when the pool size is large enough
This is an optimization which reduces the number of flushes for
small pool buffers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-01 09:53:40 +01:00
Samuel Pitoiset 2fe07933bd radv: keep track of the query pool size
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-01 09:53:39 +01:00
Samuel Pitoiset c956d0f406 radv: make sure to emit cache flushes before starting a query
If the query pool has been previously resetted using the compute
shader path.

Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query pool")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105292
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-01 09:14:49 +01:00
Dave Airlie 6bafd4f4dd radv: remove device pointer from buffer.
This is never used.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-28 09:03:26 +10:00
Dave Airlie 1fc19a0f27 radv: merge tess rings into a single bo
Inspired by a passing commit to radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-27 00:54:59 +00:00
Bas Nieuwenhuizen e72ad05c1d radv: Return NULL for entrypoints when not supported.
This implements strict checking for the entrypoint ProcAddr
functions.

 - InstanceProcAddr with instance = NULL, only returns the 3 allowed
   entrypoints.
 - DeviceProcAddr does not return any instance entrypoints.
 - InstanceProcAddr does not return non-supported or disabled
   instance entrypoints.
 - DeviceProcAddr does not return non-supported or disabled device
   entrypoints.
 - InstanceProcAddr still returns non-supported device entrypoints.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-23 00:39:02 +01:00
Bas Nieuwenhuizen 076f7cfc6b radv: Track enabled extensions.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-23 00:39:02 +01:00
Bas Nieuwenhuizen 4db78f3a6b radv: Put supported extensions in a struct.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-23 00:39:02 +01:00
Fredrik Höglund 5a38d8f103 radv: implement VK_EXT_external_memory_host
Ported from the radeonsi GL_AMD_pinned_memory implementation.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 00:46:07 +01:00
Samuel Pitoiset 4922e7f25c radv: use separate bindings for graphics and compute descriptors
The Vulkan spec says:

   "pipelineBindPoint is a VkPipelineBindPoint indicating whether
    the descriptors will be used by graphics pipelines or compute
    pipelines. There is a separate set of bind points for each of
    graphics and compute, so binding one does not disturb the other."

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104732
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-01 09:37:09 +01:00
Samuel Pitoiset cf224014dd radv: store the bind point when creating descriptors with templates
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-01 09:37:07 +01:00
Matthew Nicholls ef272b161e radv: remove predication on cache flushes
This can lead to a situation where cache flushes could get conditionally
disabled while still clearing the flush_bits, and thus flushes due to
application pipeline barriers may never get executed.

Fixes: a6c2001ace (radv: add support for cmd predication.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-31 13:37:18 +10:00
Bas Nieuwenhuizen 882eff4d20 radv: Merge raster state with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:02:05 +01:00
Bas Nieuwenhuizen 69364f1c34 radv: Move gs state out of pipeline.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:02:01 +01:00
Bas Nieuwenhuizen e4e060d135 radv: Split out cliprect rule generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:56 +01:00
Bas Nieuwenhuizen acbaef3005 radv: Merge VGT_GS_MODE computation with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:52 +01:00
Bas Nieuwenhuizen 9062b1c241 radv: Move tessellation state out of pipeline.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:38 +01:00
Bas Nieuwenhuizen 4aa1cb4e90 radv: Move blend state out of pipeline.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:34 +01:00
Bas Nieuwenhuizen 0f72f0eacb radv: Split out generating VGT_SHADER_STAGES_EN.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:30 +01:00
Bas Nieuwenhuizen 694c34314b radv: Split out the ia_multi_vgt_param precomputation.
Also moved everything in a struct and then return the struct from
the helper function, so it is clear in the caller what part of the
pipeline gets modified.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:26 +01:00
Bas Nieuwenhuizen 0bea0851aa radv: Split out db_shader_control computation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:18 +01:00
Bas Nieuwenhuizen 5dce47ae6d radv: Compute shader_z_format when emitting it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:13 +01:00
Bas Nieuwenhuizen df2e7ab0db radv: Merge depth stencil state with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:06 +01:00
Bas Nieuwenhuizen d5a0af84ec radv: Merge ps_input_cntl computation with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:01 +01:00
Bas Nieuwenhuizen e2bf18030d radv: Merge vtx_reuse_depth computation with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:55 +01:00
Bas Nieuwenhuizen c80747b32c radv: Merge vs state computation with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:50 +01:00
Bas Nieuwenhuizen c4191cf944 radv: Merge binning state generation with pm4 emission.
We don't need the pipeline state struct anymore.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:45 +01:00
Bas Nieuwenhuizen 6f1a3f081e radv: Constify some pipeline helpers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:40 +01:00
Bas Nieuwenhuizen beeab44190 radv: Record a PM4 sequence for graphics pipeline switches.
This gives about 2% performance improvement on dota2 for me.

This is mostly a mechanical copy and replacement, but at bind time
we still do:

1) Some stuff that is only based on num_samples changes.
2) Some command buffer state setting.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:22 +01:00
Bas Nieuwenhuizen 7c366bc152 radv: Determine unneeded dynamic states.
Which avoids setting or emitting them.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:17 +01:00
Dave Airlie 298554541d radv: move spi_baryc_cntl to pipeline
We need to enable the pos float location 2 mode anytime we have
persample not just when forced by the frag shader.

This fixes:
dEQP-VK.pipeline.multisample.min_sample_shading*

Fixes: 58c97a079 (radv: enable location at sample when persample is forced.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-25 06:47:28 +10:00
Dave Airlie 766589d89a radv: fix sample_mask_in loading. (v3.1)
This is ported from radeonsi and fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.bit_*

v2: don't call this path for radeonsi, it does it in the epilog.
use the radeonsi code path.
v3: handle NULL pCreateInfo->pMultisampleState properly (Samuel)
v3.1: set ps_iter_samples default to 1 (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: bdcbe7c76 (radv: add sample mask input support)
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-24 14:25:11 +10:00
Dave Airlie 316d762186 radv: add fs_key meta format support to resolve passes.
Some of the hw resolve passes need the SPI color format setup
correctly.

This fixes lots of 16-bit and 32-bit format tests in
dEQP-VK.renderpass.suballocation.multisample*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-24 08:50:51 +10:00
Bas Nieuwenhuizen b1444c9ccb radv: Implement VK_ANDROID_native_buffer.
Passes
  dEQP-VK.api.smoke.*
  dEQP-VK.wsi.android.*

with android-cts-7.1_r12 .

Unlike the initial anv implementation this does
use syncobjs instead of waiting on the CPU.

This is missing meson build coverage for now.

One possible todo is that linux 4.15 now has a
sycall that allows us to export amdgpu fence to
a sync_file, which allows us not to force all
fences and semaphores to use syncobjs. However,
I had trouble with my kernel crashing regularly
with NULL pointers, and I'm not sure how beneficial
it is in the first place given that intel uses
syncobjs for all fences if available.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-19 01:43:55 +01:00
Bas Nieuwenhuizen a3e241ed07 radv: Add create image flag to not use DCC/CMASK.
If we import an image, we might not have space in the
buffer for CMASK, even though it is compatible.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-19 01:43:55 +01:00
Bas Nieuwenhuizen 0b8991c0b6 radv: Implement VK_EXT_debug_report.
This is not hooked up to any messages yet, but useful for e.g.
renderdoc if you add some messages during development.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-17 11:29:04 +01:00
Dave Airlie ad11fc3571 radv: don't emit unneeded vertex state.
If the number of instances hasn't changed and we've already
emitted it, don't emit it again.

If the vertex shader is the same and the first_instance, vertex_offset
haven't changed don't emit them again.

This increases the fps in GL_vs_VK -t 1 -m -api vk from around 40
to around 60 here, it may not impact anything else.

Dieter also reported smoketest going from 1060->1200 fps.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-12 00:43:07 +00:00
Bas Nieuwenhuizen 5db0bf9994 radv: Implement VK_EXT_discard_rectangles.
Tested with a modified deferred demo and no regressions in a 1.0.2
mustpass run.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-10 13:26:22 +01:00
Bas Nieuwenhuizen 11b9cdd2d7 radv: Add mapping between dynamic state mask and external enum.
The EXT values are really large, e.g.
VK_DYNAMIC_STATE_DISCARD_RECTANGLE_EXT = 1000099000, so 1 << value
is not going to fit into a 32-bit mask.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-01-10 13:24:31 +01:00
Samuel Pitoiset b09b3f8834 radv: add has_scissor_bug for Vega10 and Raven
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:56 +01:00
Samuel Pitoiset a3c2a86757 radv: make shader BOs read-only for the GPU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:51 +01:00
Samuel Pitoiset 87efa71001 radv: remove unused radv_color_buffer_info::cb_clear_valueX
Found by inspection.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-05 17:26:51 +01:00
Bas Nieuwenhuizen 6a36bfc64d radv: Implement binning on GFX9.
Overall it does not really help or hurt. The deferred demo gets 1%
improvement and some games a 3% decrease, so I don't think this
should be enabled by default.

But with the code upstream it is easier to experiment with it.

v2: Remove initializing the registers from si_emit_config.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-12-31 15:07:07 +01:00
Bas Nieuwenhuizen 44fcf58744 radv: Disable DCC for GENERAL layout and compute transfer dest.
Apps can use this for render feedback loops, where things are
defined if they render each pixel only once. However, DCC fails
here, as the level of coherence is a block not a pixel, so disable it.

This is also going to help implementing other stuff.

Even if we optimize this later to only happen if there actually is
a loop (if possible at all ...), then the machinery is still useful
to exclude images accessible by the SDMA queue when that is implemented.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-29 12:21:53 +01:00
Bas Nieuwenhuizen 1cfab28e6e radv: Make color meta operations layout aware.
For fast clear eliminate and decompressions, we always use the most compressed
format.

For clears, the code already creates a renderpass on demand with the exact same
layout as specified.

Otherwise we start distinguishing between GENERAL and TRANSFER_DST_OPTIMAL.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-29 12:21:44 +01:00
Bas Nieuwenhuizen 3e2a6191c9 radv: Add compute DCC decompress.
We do an in place copy where we read compressed and write decompressed.
By doing this in sizes that cover entire DCC blocks and waiting for all
reads in the block before starting to write we avoid corruption.

In the end we clear the DCC metadata to 0xffffffff.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-29 12:21:40 +01:00
Bas Nieuwenhuizen e5feeec140 radv: Add GFX DCC decompress.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-29 12:21:31 +01:00
Dave Airlie 420627e6e7 radv/gfx9: fix buffer to image for 3d images on compute queues
This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.

Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-29 09:37:09 +10:00
Dave Airlie 09612a62e1 radv/gfx9: fix 3d image clears on compute queues
This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.

Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-29 09:37:05 +10:00