Commit Graph

177 Commits

Author SHA1 Message Date
Samuel Pitoiset 878bd981bf radeonsi: isolate real framebuffer changes from the decompression passes (v3)
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.

v2: Marek - set the flags outside the loop
          - also clear and set framebuffer.do_update_surf_dirtiness there
          - do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:14 +02:00
Marek Olšák 7d67cbefe0 radeonsi: clean up decompress blend state names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:45 +02:00
Marek Olšák d2ee423b69 radeonsi: enable TC-compatible stencil compression on VI
Most things are in place. Ideally we won't see decompress blits for stencil
anymore.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:39 +02:00
Marek Olšák a893c91697 gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible
so that we can use TXF.

The cubemap blit pixel shader code size: 148 -> 92 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:10:50 +02:00
Samuel Pitoiset def02007cd radeonsi: add new si_check_render_feedback_texture() helper
For bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:05:41 +02:00
Samuel Pitoiset fbcc8664fd radeonsi: add new si_decompress_color_texture() helper
For bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:05:38 +02:00
Samuel Pitoiset 9cc91ba6d5 radeonsi: add a 'break' in si_check_render_feedback_*()
No need to check all color buffers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:05:29 +02:00
Marek Olšák 3b1934d9b6 gallium/radeon: s/dcc_disable/disable_dcc/
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-30 16:09:39 +02:00
Marek Olšák 45a71d5de5 radeonsi: handle incompatible DCC formats in resource_copy_region
Required because of later commits.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
2017-03-30 16:09:39 +02:00
Marek Olšák b05b8587ae radeonsi: remove a workaround for inexact *8_SNORM blits
All tests pass on Fiji now. This prevents DCC disablement due to
incompatible DCC formats due to the fallback.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
2017-03-30 16:09:39 +02:00
Marek Olšák a955ee788f gallium/radeon: add and use a new helper vi_dcc_enabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-30 16:09:37 +02:00
Marek Olšák 405bacd820 radeonsi/gfx9: fix MIP0_WIDTH & MIP0_HEIGHT for compressed texture blits
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák 272b50a6f4 radeonsi/gfx9: do DCC clears on non-mipmapped textures only
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák 861d7af1cb radeonsi: use a bitmask-based loop in si_decompress_textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák a131dacb14 radeonsi: add CP DMA flags for greater control over synchronization
for L2 prefetch

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Grazvydas Ignotas c81a89f662 radeonsi: fix release build unused variable warnings
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-12-10 21:19:59 +01:00
Marek Olšák f2b0c66c3c radeonsi: properly declare context sampler states
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 18:46:54 +01:00
Marek Olšák 72d1669ed2 radeonsi: check for !is_linear in do_hardware_msaa_resolve
We don't want opt4Space here.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 00baaa4752 radeonsi: fix an assertion failure in si_decompress_sampler_color_textures
This fixes a crash in Deus Ex: Mankind Divided. Release builds were
unaffected, so it's not too serious.

Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-04 11:30:47 +01:00
Marek Olšák 7786f8c635 gallium/radeon: add enum radeon_micro_mode
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák bf4d102ea3 gallium/radeon: add radeon_surf::is_linear
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák c66a550385 gallium/radeon: don't call u_format helpers if we have that info already
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák 692f2640ab gallium/radeon: replace radeon_surf_info::dcc_enabled with num_dcc_levels
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák d4d9ec55c5 radeonsi: implement TC-compatible HTILE
so that decompress blits aren't needed and depth texturing needs less
memory bandwidth.

Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible
HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16.
The format promotion is not visible to state trackers.

This is part of TC-compatible renderbuffer compression, which has 3 parts:
DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now.

I don't see a measurable increase in performance though.

(I tested Talos Principle and DiRT: Showdown, the latter is improved by
 0.5%, which is almost noise, and it originally used layered Z16,
 so at least we know that Z16 promoted to Z32F isn't slower now)

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák 21de3be8e6 radeonsi: fix texture format reinterpretation with DCC
DCC is limited in how texture formats can be reinterpreted using texture
views. If we get a view format that is incompatible with the initial
texture format with respect to DCC, disable DCC.

There is a new piglit which tests all format combinations.
What works and what doesn't was deduced by looking at the piglit failures.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák 5ee3cac138 radeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK or VI
SDMA is much faster for tiled->linear blits from VRAM to GTT.
I have Bonaire in my second PCIe slot.

$ glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD TONGA ...

$ DRI_PRIME=1 glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD BONAIRE ...

Without SDMA:
$ DRI_PRIME=1 glxgears
8796 frames in 5.0 seconds = 1759.074 FPS
8899 frames in 5.0 seconds = 1779.672 FPS

With SDMA:
$ DRI_PRIME=1 glxgears
12765 frames in 5.0 seconds = 2552.788 FPS
12888 frames in 5.0 seconds = 2577.495 FPS

The 1st GPU is irrelevant. The improvement should be much lower at 60 fps,
but definitely measurable.

SI will get this once we add SDMA blit support for it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák a6b5845a0d radeonsi: use current context for DCC feedback-loop decompress, fixes Elemental
This is just a workaround. The problem is described in the code.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541

v2: say that it's only between the current context and aux_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-08-17 12:24:35 +02:00
Marek Olšák 7df15389af gallium/radeon: handle render_condition_enable for clear_rt/ds
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:10:21 +02:00
Marek Olšák a909210131 gallium: add render_condition_enable param to clear_render_target/depth_stencil
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:10:21 +02:00
Nicolai Hähnle 65d48fcf8c radeonsi: silence Coverity warning
Coverity's analysis is too weak to understand that
r600_init_flushed_depth(_, _, NULL) only returns true when
flushed_depth_texture was assigned a non-NULL value.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-13 09:52:39 +02:00
Nicolai Hähnle 1a0a8efcce radeonsi: decompress to flushed depth texture when required
v2: s/dirty_level_mask/stencil_dirty_level_mask/ in stencil case

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:51 +02:00
Nicolai Hähnle 4b7961da77 radeonsi: extract DB->CB copy logic into its own function
Also clean up some of the looping.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:51 +02:00
Nicolai Hähnle f2eb34f82f gallium/radeon: replace is_flushing_texture with db_compatible
This is a left-over of when I considered generalizing the separate stencil
support. I do prefer the new name since it emphasizes what flushing vs.
non-flushing means from a functional point-of-view, namely special handling
of the texture format.

v2: adjust r600_init_color_surface as well

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:48 +02:00
Nicolai Hähnle 065eeb79f7 radeonsi: correctly mark levels of 3D textures as fully decompressed
Account for the fact that max_layer is minified for higher levels.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:42:49 +02:00
Marek Olšák 49e3c74cdd gallium/radeon: add a heuristic enabling DCC for scanout surfaces (v2)
DCC for displayable surfaces is allocated in a separate buffer and is
enabled or disabled based on PS invocations from 2 frames ago (to let
queries go idle) and the number of slow clears from the current frame.

At least an equivalent of 5 fullscreen draws or slow clears must be done
to enable DCC. (PS invocations / (width * height) + num_slow_clears >= 5)

Pipeline statistic queries are always active if a color buffer that can
have separate DCC is bound, even if separate DCC is disabled. That means
the window color buffer is always monitored and DCC is enabled only when
the situation is right.

The tracking of per-texture queries in r600_common_context is quite ugly,
but I don't see a better way.

The first fast clear always enables DCC. DCC decompression can disable it.
A later fast clear can enable it again. Enable/disable typically happens
only once per frame.

The impact is expected to be negligible because games usually don't have
a high level of overdraw. DCC usually activates when too much blending
is happening (smoke rendering) or when testing glClear performance and
CMASK isn't supported (Stoney).

v2: rename stuff, add assertions

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák 3eacbc52d5 radeonsi: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák 70a25478fe radeonsi: use u_blitter for mipmap generation
This reduces time spend in glGenerateMipmap by a half.

v2: don't decompress the levels to be overwritten

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-21 13:52:05 +02:00
Marek Olšák 4eea710b0d radeonsi: try to hit direct hw MSAA resolve by changing micro mode in clear
We could also do MSAA resolve in a compute shader like Vulkan and remove
these workarounds.

v2: comment the magic numbers

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák 373060652c radeonsi: clarify the MSAA resolve limitation with scanout
this is the correct hw requirement

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák 95288277d5 Revert "radeonsi: allow direct hw MSAA resolve for scanout surfaces"
This reverts commit ffd54d1936.

No, it doesn't work. The test case is "glxgears -samples 2".
2016-06-08 19:21:55 +02:00
Marek Olšák 7c6e88b643 radeonsi: allow MSAA resolving into a texture that has DCC enabled
Since DCC is enabled almost everywhere now, it's important not to disable
this fast path.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák ffd54d1936 radeonsi: allow direct hw MSAA resolve for scanout surfaces
No idea why this was disabled, but it works fine.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák 4be46c7d9d radeonsi: don't allocate DCC for the temporary MSAA resolve surface
Allocating it has no effect, but it adds overhead (useless DCC clear).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák aa7fe70443 radeonsi: add per-level dcc_enabled flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Nicolai Hähnle 031b57bc2f radeonsi: move enabled_mask out of si_descriptors
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:23 +02:00
Marek Olšák 6b449783f6 radeonsi: use hw MSAA resolve for non-trivial resolves
This improves MSAA resolve performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Bas Nieuwenhuizen 35818129a6 radeonsi: Decompress DCC textures in a render feedback loop.
By using a counter to quickly reject textures that are not
bound to a framebuffer, the performance impact when binding
sampler_views/images is not too large.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 21:43:04 +02:00
Marek Olšák d5882bb0df radeonsi: do GL-compliant integer resolves
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:48:54 +02:00
Marek Olšák 9983efca76 radeonsi: use the hw MSAA resolving if formats are compatible
This allows resolving RGBA into RGBX.
This should improve HL2 Lost Coast performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák f564b61d33 radeonsi: rework clear_buffer flags
Changes:
- don't flush DB for fast color clears
- don't flush any caches for initial clears
- remove the flag from si_copy_buffer, always assume shader coherency

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-28 20:16:56 +02:00
Nicolai Hähnle 7a215a3e27 radeonsi: expclear must be disabled on first Z/S clear
The documentation and the HW team say so.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle 01a3bb5d8b radeonsi: move blend choice out of loop in si_blit_decompress_color
It does not depend on the level or layer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle 450ff0f0d5 radeonsi: use level mask for early out in si_blit_decompress_color
Mostly for consistency with the other decompress functions, but note that
in the non-DCC decompress case, the function can now early-out in slightly
more (albeit probably rare) cases.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle 0ff05b55c6 radeonsi: si_blit_decompress_depth is only used for staging
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle 0b70fc2db4 radeonsi: only decompress the required ZS planes from si_blit
This happens to "fix" a rendering bug in KotOR2, because it avoids a still
not quite understood bug with MSAA fast stencil clear decompress. For the
stencil clear bug, I have sent a piglit test (arb_texture_multisample-stencil-clear).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93767
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle def53a0b3d radeonsi: decompress Z & S planes in one pass
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle dc6fc2f390 radeonsi: early out of si_blit_decompress_depth_in_place based on dirty mask
Avoid dirtying the db_render_state atom when possible.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle d14d6c3f58 radeonsi: use MIN2 instead of expanded ?: operator
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle 159f182a57 radeonsi: fix brace style
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Marek Olšák 3acaefb1bb radeonsi: shorten slot masks to 32 bits
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Bas Nieuwenhuizen 061ce9399a radeonsi: split texture decompression for compute shaders
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Marek Olšák 2ca5566ed7 radeonsi: move scissor and viewport states into gallium/radeon
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:13:24 +02:00
Marek Olšák f3eebb84eb radeonsi: implement and rely on set_active_query_state
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:46 +02:00
Nicolai Hähnle 9d2693f58a radeonsi: expand the compressed color and depth texture masks to 64 bits
This is in preparation of raising the number of exposed sampler views to 32
bits, which will raise the total number of sampler views to 33 for the
polygon stipple texture. That texture should never be compressed (and it's
certainly not a depth texture), but this approach seems cleaner to me than
special-casing the last slot in all affected code paths.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Nicolai Hähnle 515fb2c09c radeonsi: decompress shader images
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle 59c5508b9a radeonsi: update compressed_colortex_masks when a cmask is created or disabled
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:52 -05:00
Nicolai Hähnle da68a9b215 radeonsi: move si_decompress_textures to si_blit.c
Since it is all about calling into blitter functions, it makes more
sense here. This change also reduces the size of the interfaces between
.c files.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:49 -05:00
Marek Olšák f18fc70d6f radeonsi: disable DCC on handle export if expecting write access
This should be okay except that sampler views and images are not re-set.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Bas Nieuwenhuizen 1e48ec7571 radeonsi: add DCC decompression (v2)
This is currently not needed but will be necessary when we have
features that do not work with DCC enabled, such as image stores
and sharing non-scanout surfaces.

v2: Marek: rebase, remove decompression from si_flush_resource (not needed)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák b744ac9f44 radeonsi: allocate DCC in the same backing buffer as the texture
To allow sharing textures with DCC enabled.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák 970b979da1 gallium/radeon: eliminate fast color clear before sharing
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák 7aedbbacae radeonsi: put image, fmask, and sampler descriptors into one array
The texture slot is expanded to 16 dwords containing 2 descriptors.
Those can be:
- Image and fmask, or
- Image and sampler state

By carefully choosing the locations, we can put all three into one slot,
with the fmask and sampler state being mutually exclusive.

This improves shaders in 2 ways:
- 2 user SGPRs are unused, shaders can use them as temporary registers now
- each pair of descriptors is always on the same cache line

v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask
    at the same time

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-10 19:41:49 +01:00
Marek Olšák f6360de8c0 radeonsi: use all SPI color formats
because not using SPI_SHADER_32_ABGR doubles fill rate.

We should also get optimal performance if alpha isn't needed or blending
isn't enabled.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák 1a24f443b4 radeonsi: implement fast stencil clear
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák 6eff5415e4 gallium/radeon: simplify disabling render condition for u_blitter
just disable it by not setting the predication bit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák f7757100f2 radeonsi: add glClearBufferSubData acceleration
8-bit and 16-bit clears which are not aligned to dwords are done in software.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák 19773f9805 radeonsi: add SI_SAVE_FRAGMENT_STATE blitter flag
Buffer clears via transform feedback won't set this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák e82c527f1f radeonsi: allow copying between compatible compressed and uncompressed formats
which is where a block in src maps to a pixel in dst and vice versa.
e.g. DXT1 <-> R32G32_UINT
     DXT5 <-> R32G32B32A32_UINT

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-28 11:52:17 +01:00
Marek Olšák 235d38584c radeonsi: properly check if DCC is enabled and allocated
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Bas Nieuwenhuizen 6529daca39 radeonsi: Implement DCC fast clear.
Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
still works. Furthermore, with DCC compression we can directly clear
to a limited set of colors such that we do not need a postprocessing step.

v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 17:46:08 +02:00
Bas Nieuwenhuizen bb77467df9 radeonsi: Disable operations that do not work with DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:24 +02:00
Marek Olšák edf6a4537c radeonsi: only apply the SNORM blit workaround to *8_SNORM
Like the comment says. This fixes DCC, which doesn't like blitting RG16
as RGBA8.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák 9b54ce3362 radeonsi: support thread-safe shaders shared by multiple contexts
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.

The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.

This is also a prerequisite for multithreaded shader compilation.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:51:51 +02:00
Marek Olšák c23c92c965 radeonsi: only do depth-only or stencil-only in-place decompression
instead of always doing both.
Usually, only depth is needed, so stencil decompression is useless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák e21418f221 radeonsi: convert stencil ref state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 74aa64876b radeonsi: convert sample mask state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 0c2eed0ede radeonsi: avoid redundant CB and DB register updates
The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.

Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 8a97528b3a radeonsi: optimize viewport states
same as scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák f6a10f60b7 radeonsi: optimize scissor states
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák ab0643225e util/u_blitter: implement alpha blending for pipe->blit 2015-08-21 22:21:45 +02:00
Grazvydas Ignotas 3206d4ed44 gallium/radeon: use helper functions to mark atoms dirty
This is analogous to r300_mark_atom_dirty() used by r300, and will
be used by later patches. For common radeon code, appropriate helper
is called through a function pointer.

No functional changes.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-11 14:46:53 +02:00
Marek Olšák 3050978864 radeonsi: copy *8_SNORM bits exactly in resource_copy_region
Disabling the FP16 mode didn't help.

If needed, we can use this trick for blits too, but not for scaled blits.

+ 4 piglits

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-31 16:49:17 +02:00
Marek Olšák f9c4953f99 radeonsi: early exit in si_clear if there's nothing to do
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-31 16:49:17 +02:00
Marek Olšák d1f43a7e5b radeonsi: add code for creating, binding and destroying tessellation shaders
This doesn't do anything yet.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:31 +02:00
Marek Olšák 46b2b3bda8 radeonsi: don't change pipe_resource in resource_copy_region
Copied from r600g. pipe_resource can be shared by multiple threads, so we
shouldn't change it.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:24 +02:00
Dave Airlie 7e5064360c radeonsi: add support for viewport array (v3)
This isn't pretty and I'd suggest it the pm4 interface builder
could be tweaked to do this more efficently, but I'd need
guidance on how that would look.

This seems to pass the few piglit tests I threw at it.

v2: handle passing layer/viewport index to fragment shader.
fix crash in blit changes,
add support to io_get_unique_index for layer/viewport index
update docs.
v3: avoid looking up viewport index and layer in es (Marek).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-27 00:24:07 +01:00
Marek Olšák edf18da85d radeonsi: only flush the right set of caches for CP DMA operations
That's either framebuffer caches or caches for shader resources.
The motivation is that framebuffer caches need to be flushed very rarely
here.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák 884f1654e2 radeonsi: move DB registers from draw_vbo into new db_render_state
It's called db_misc_state in r600g.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-24 14:48:02 +02:00
Marek Olšák d13d2fd161 r600g,radeonsi: add debug option which forces DMA for copy_region and blit 2014-09-12 22:51:28 +02:00
Marek Olšák a10c8db715 radeonsi: implement EXPCLEAR optimization for depth
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:52 +02:00