Do it when we bind shaders.
The advantages are:
- no need to memset the fields when any shader variant state is changed
(e.g. culling on/off)
- no need to recompute the fields every time that happens
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12656>
We would like draw-only display lists to have immutable draw info and
this is the only GL non-draw state in pipe_draw_info (not counting
view_mask).
It also allows removing some code from draw_vbo for tessellation.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12351>
When this flag is set, u_threaded_context will try not to map it directly
for better buffer placement. It's set by drivers when visible VRAM is too
small.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12257>
The GDS ordered append variant is unstable due to kernel and firmware bugs.
The unordered GDS variant isn't faster than the memory-based variant.
Only the memory-based variant is kept.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11510>
Determine if a given shader write the same constant value to its output
if a specific input texture is replaced by constant load.
It's done by checking if the store_output intrinsics only depends on
constant and a texture. If it's true, the given texture is replaced by
a constant load in cloned shader and this clone is optimized.
Then the output is checked (= is it constant or not).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10979>
This allows to implement custom draw_vbo code-path without
touching si_draw_vbo.
As an example, skipped all draw calls with an odd new_draws
could be done like this:
void mywrapper(...) {
if (new_draws % 2)
return;
return sctx->real_draw_vbo(...);
}
if (some_condition_is_met)
si_install_draw_wrapper(sctx, mywrapper);
Instead of having to add the "if ()" condition inside si_draw_vbo.
Note that a single wrapper may be installed so care must be taken
to not override an existing wrapper.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10979>
tc already calculates all the rebinding that needs to be done on a given
context, so (some of) this info can be passed on to drivers to enable
optimizations
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11245>
the only case in which this is nonzero is if a multidraw gets split by the frontend,
i.e., mesa core, and in all other cases it can be ignored. the value can also be ignored
for all indirect draws, though it seems many (most?) gallium drivers are not aware of this
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10166>
this moves index_bias into the multidraw struct, enabling draws where the value
changes to be merged; the draw_info struct member is renamed and moved to the end
of the struct for tc use
u_vbuf still has some checks to split draws if index_bias changes, maybe
this can be removed at some point?
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10166>
Just pass down the modifier list to vl_video_buffer_create_as_resource,
filtering out DCC modifiers because we don't support these for now.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10237>
Only internal compute shaders use DCC stores, so the TODOs are not
critical yet.
Fixes: 1d64a1045e - radeonsi: enable dcc image stores on gfx10+
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>
MSAA 4x and 8x should only clear the first 2 samples because other samples
are uncompressed. The compute shader only clears that subset of DCC.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10003>
The retile map is removed and replaced by direct DCC address computations
in the retile shader using the new function ac_nir_dcc_addr_from_coord.
The RADV code is disabled.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10003>
This adds a clear_buffer compute shader that does read-modify-write to
update a subset of bits in HTILE.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10003>
Set the final value in si_texture_create_object, so that other places
don't have to derive it redundantly.
The only thing to remember is that HTILE stencil can be enabled when
stencil is not present, and it can be disabled when stencil is present
due to various workarounds.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10003>
Wrap the si_cp_wait_mem call to emit RGP_SQTT_MARKER_IDENTIFIER_BARRIER_START and
RGP_SQTT_MARKER_IDENTIFIER_BARRIER_END events.
Only for gfx9+ for now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10105>
Clearing 8 RTs with both DCC and CMASK caused 16 synchronized clears where
we also did 16 times WAIT_REG_MEM for CB flushes that were 15 times
useless.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795>
to allow packing the block size in 1 user SGPR with 10 bits per component,
so that block sizes such as 512x1x1 fit in there.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795>
DCC/CMASK/HTILE clears will not set this. We could do a better job
at not setting this in other cases too
Image copies also don't set this.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795>
When we encounter a situation when we need to swizzle, which the CB can't
resolve in one pass, swap the channel order on the next clear, so that we
don't have to swizzle.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9615>
It ignored non-harvested chips with a non-power-of-two memory bus.
Fixes: abed921ce7 - amd: add support for Navy Flounder
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9568>
The in-memory shader cache can get significantly
huge in some rare cases.
Limit its size to 64MB on 32 bits, and 1GB else.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9578>
This allows skipping mutex locking. Don't take the aux context into account.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9356>
With these changes the shader code is visible in RGP.
Vk pipeline feature is emulated using si_update_shaders: when shaders are
updated we compute a sha1 of their code and use it as a pipeline hash.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
This was implemented in 1d3bffaf9c,
but missing the WRITE_COMPRESS_ENABLE bit, then disabled by
4dc6ed2a59040f04648eadbffeb1522587d00f3.
This commits reimplements it to:
- avoid disabling dcc when uploading FP16 textures
(see si_use_compute_copy_for_float_formats)
- being able to use compute to upload textures in more cases, rather
than using the blit path
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8958>
The new enum has 3 values:
- SI_CP_DMA_CLEAR_METHOD: equivalent to force_cp_dma = true
- SI_COMPUTE_CLEAR_METHOD: to force the clear to use compute
- SI_AUTO_SELECT_CLEAR_METHOD: equivalent to force_cp_dma = false
No functional change yet, but this will be used later.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8958>
Decreasing the time spent in radeon_cs_memory_below_limit is the motivation.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8794>
Using event identifiers allows to add a bit more context to the RGP trace.
Without this all draw calls are identified as vkCmdDraw.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8746>
This radically simplifies the code to decrease CPU overhead in si_draw_vbo.
The generic CP DMA copy function is too complicated.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8548>
This is a great candidate for a template. There are a lot of conditions
that are already templated in si_draw_vbo.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8548>
Enable vrs2x2 coarse shading if flat shading as per
idea and guidance given by Marek.
is_flat_shading variable in struct si_shader_info is set
based on the data from gather_intrinsic_info() function
and struct si_state_rasterizer. If is_flat_shading_variable
is set, then in function si_emit_db_render_state() vrs2x2
shading is enabled in hardware.
v2: Fix review comments from Pierre-Eric. Code optimizations.
v3: Fix indentation style issue.
v4: Fix review comments from Marek. Fixed logical issue pointed
by Marek where info->is_flat_shading variable can be corrupted
and other code cleanup.
v5: Make the code compact as suggested by Pierre-Eric.
v6: Fix new review comments from Marek.
v7: use info->uses_interp_color variable fix from Marek.
v8: Fix coding style comment from Marek.
v9: Add uses_fbfetch_output check as suggested by Marek.
Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8161>
The problem was that the shader constants were based on the framebuffer
sample count and ignored the multisample enable state and the line/polygon
smoothing state, which uses MSAA rasterization that only sets SampleMaskIn
to get the coverage for alpha-blended smoothing (the PS epilog computes
the alpha channel from SampleMaskIn and blending generates the AA results).
- This is a complete rework that adds a new state for NGG cull constants.
- It fixes the same thing for the prim discard compute shader.
- It documents how VS_STATE.SMALL_PRIM_PRECISION is encoded.
It fixes blue corruption in Unigine Heaven with MSAA and Medium details
or better.
Fixes: 7648060dc0 - radeonsi: enable NGG culling by default on gfx10.3 dGPUs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8022>
There are many issues with SDMA across many generations of hardware.
A recent example is that gfx10.3 suffers from random GPU hangs if
userspace uses SDMA.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7908>
It's straightforward except that the amdgpu winsys had to be cleaned up
to allow this.
radeon_cmdbuf is inlined and optionally the winsys can save the pointer
to it. radeon_cmdbuf::priv points to the winsys cs structure.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7907>
invalidate_draw_sh_constants should invalidate only SGPRs.
invalidate_draw_constants invalidates SGPRs and NUM_INSTANCES.
u_blitter called invalidate_draw_sh_constants, which previously
invalidated NUM_INSTANCES as well. This commit fixes that.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
This aligns the offsets to match the memory layout of the query buffer
defined by gfx10_sh_query_buffer_mem and calls si_launch_grid_internal
to flush caches and wait for completion of shaders prior to retrieving
results.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7181>
Instead of:
if (VS) {
VS;
}
if (TCS) {
TCS;
}
Do this if the number of threads is the same in VS and TCS:
exec = enabled_threads;
VS;
TCS;
Skipping declare_vb_descriptor_input_sgprs is needed to match the VS return
values.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7623>
Flushes non-explicit shared textures that need retiling on
* glFlush
* glSync
* glSignalSemaphoreEXT
* DRI fences.
* The first time we create a non-explicit handle for it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6176>
This improves performance for uber shaders.
It must be enabled using the new driconf option.
The driver compiles the specialized shaders in another thread without stalls,
same as all other optimizations.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7057>
Add a vertex count threshold into si_shader_selector to simplify
the draw_vbo code.
The new option is supposed to be used in 00-mesa-defaults.conf and should be
tweaked for best performance unlike the AMD_DEBUG experimental options.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6948>
So far, the callback to create a resource from a memory object had code
for importing textures only. Modified it to allow importing buffers too.
Fixes the following piglit tests:
- ext_external_objects/vk-buf-exchange
- ext_external_objects/vk-pix-buf-update-errors
- ext_external_objects/vk-vert-buf-update-errors
- ext_external_objects/vk-vert-buf-reuse
v2: Used si_alloc_buffer_struct instead of CALLOC
v3: Fixed indentation issue, removed free in case of unsuccessful
allocation, joined two if conditions together
Signed-off-by: Eleni Maria Stea <estea@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6364>
tess_rings must be encrypted when used in a secure job so this commit
introduces a tess_rings_tmz resource.
The cs_preamble_state doesn't contain the tess_rings address anymore since
it can change. The tess_rings related registers go in a separate preamble.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049>
This commit makes TMZ always allowed instead of being either off or forced-on
with AMD_DEBUG=tmz.
With this change:
- secure job can be used as soon as the application made a tmz allocation. Driver
internal allocations are not enough to enable secure jobs (if tmz is supported
and enabled by the kernel)
- AMD_DEBUG=tmz forces all scanout/depth/stencil buffers to be allocated as TMZ.
This is useful to test app thats don't explicitely support protected content.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049>
Tag allocations as driver internal.
Some of these allocations will need to be doubled to handle TMZ (one secure bo,
one normal bo) but these allocations shouldn't switch the winsys in "the app
is using TMZ".
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049>
The retile maps are a software mechanism and hence very suceptible
to change. As such I'd like to avoid making it part of the cross
driver ABI.
Ideally we'd just use the cached tile info + a shader to avoid these
buffers altogether.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6783>
This reverts commit 430d384c31.
BIT_PAGE can't be set for GTT and we don't know if a buffer has been
evicted to GTT.
Fixes: 430d384c31
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6722>
so that they can be different depending on the GPU (for 16-bit support)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284>