When the current bound shaders don't use any bindless textures
or images, it's useless to decompress the resident resources.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Analogous to bound textures/images. We should also update the
resident descriptors and disable COMPRESSION_EN for avoiding
useless DCC fetches, but I postpone this optimization for a
separate series.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.
v2: Marek - set the flags outside the loop
- also clear and set framebuffer.do_update_surf_dirtiness there
- do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All tests pass on Fiji now. This prevents DCC disablement due to
incompatible DCC formats due to the fallback.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
This fixes a crash in Deus Ex: Mankind Divided. Release builds were
unaffected, so it's not too serious.
Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that decompress blits aren't needed and depth texturing needs less
memory bandwidth.
Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible
HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16.
The format promotion is not visible to state trackers.
This is part of TC-compatible renderbuffer compression, which has 3 parts:
DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now.
I don't see a measurable increase in performance though.
(I tested Talos Principle and DiRT: Showdown, the latter is improved by
0.5%, which is almost noise, and it originally used layered Z16,
so at least we know that Z16 promoted to Z32F isn't slower now)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
DCC is limited in how texture formats can be reinterpreted using texture
views. If we get a view format that is incompatible with the initial
texture format with respect to DCC, disable DCC.
There is a new piglit which tests all format combinations.
What works and what doesn't was deduced by looking at the piglit failures.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
SDMA is much faster for tiled->linear blits from VRAM to GTT.
I have Bonaire in my second PCIe slot.
$ glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD TONGA ...
$ DRI_PRIME=1 glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD BONAIRE ...
Without SDMA:
$ DRI_PRIME=1 glxgears
8796 frames in 5.0 seconds = 1759.074 FPS
8899 frames in 5.0 seconds = 1779.672 FPS
With SDMA:
$ DRI_PRIME=1 glxgears
12765 frames in 5.0 seconds = 2552.788 FPS
12888 frames in 5.0 seconds = 2577.495 FPS
The 1st GPU is irrelevant. The improvement should be much lower at 60 fps,
but definitely measurable.
SI will get this once we add SDMA blit support for it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is just a workaround. The problem is described in the code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541
v2: say that it's only between the current context and aux_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Coverity's analysis is too weak to understand that
r600_init_flushed_depth(_, _, NULL) only returns true when
flushed_depth_texture was assigned a non-NULL value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a left-over of when I considered generalizing the separate stencil
support. I do prefer the new name since it emphasizes what flushing vs.
non-flushing means from a functional point-of-view, namely special handling
of the texture format.
v2: adjust r600_init_color_surface as well
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
DCC for displayable surfaces is allocated in a separate buffer and is
enabled or disabled based on PS invocations from 2 frames ago (to let
queries go idle) and the number of slow clears from the current frame.
At least an equivalent of 5 fullscreen draws or slow clears must be done
to enable DCC. (PS invocations / (width * height) + num_slow_clears >= 5)
Pipeline statistic queries are always active if a color buffer that can
have separate DCC is bound, even if separate DCC is disabled. That means
the window color buffer is always monitored and DCC is enabled only when
the situation is right.
The tracking of per-texture queries in r600_common_context is quite ugly,
but I don't see a better way.
The first fast clear always enables DCC. DCC decompression can disable it.
A later fast clear can enable it again. Enable/disable typically happens
only once per frame.
The impact is expected to be negligible because games usually don't have
a high level of overdraw. DCC usually activates when too much blending
is happening (smoke rendering) or when testing glClear performance and
CMASK isn't supported (Stoney).
v2: rename stuff, add assertions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This reduces time spend in glGenerateMipmap by a half.
v2: don't decompress the levels to be overwritten
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We could also do MSAA resolve in a compute shader like Vulkan and remove
these workarounds.
v2: comment the magic numbers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Since DCC is enabled almost everywhere now, it's important not to disable
this fast path.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
No idea why this was disabled, but it works fine.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Allocating it has no effect, but it adds overhead (useless DCC clear).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>