Commit Graph

594 Commits

Author SHA1 Message Date
Samuel Pitoiset 021feb1bf6 ac: add rbplus_allowed to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-27 08:04:41 +02:00
Samuel Pitoiset b55919cf2a ac: add has_gfx9_scissor_bug to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-27 08:04:32 +02:00
Samuel Pitoiset 2b9c371575 ac: add cpdma_prefetch_writes_memory to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-27 08:04:29 +02:00
Samuel Pitoiset 63c0b89b8f ac: add has_rbplus to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-27 08:04:19 +02:00
Samuel Pitoiset 44a46c09de ac: add has_dcc_constant_encode to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-27 08:04:16 +02:00
Samuel Pitoiset c08401f035 ac: add has_distributed_tess to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-27 08:04:11 +02:00
Samuel Pitoiset d62d2840c4 ac: add has_clear_state to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-27 08:04:05 +02:00
Marek Olšák 5d37194d43 radeonsi: remove the unsafemath debug option
unlikely to be used in the future

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-19 17:23:38 -04:00
Marek Olšák 91227a1e17 radeonsi/gfx10: add global use_ngg and use_ngg_streamout flags
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:09:02 -04:00
Marek Olšák 8d90157d49 radeonsi: make sure that rasterizer state != NULL and remove all NULL checking
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:39 -04:00
Marek Olšák 8b8819e88a radeonsi: make sure that DSA state != NULL and remove all NULL checking
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:39 -04:00
Marek Olšák b758eed9c3 radeonsi: make sure that blend state != NULL and remove all NULL checking
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:39 -04:00
Marek Olšák 417ab8ef6b radeonsi: add AMD_DEBUG=nogfx for testing
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:53 -04:00
Marek Olšák 47f41af06c radeonsi: return success from vi_dcc_clear_level to simplify callers
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:54 -04:00
Marek Olšák 1d82240f55 radeonsi/gfx10: add debug options to enable/disable Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák 8f72f137ad radeonsi/gfx10: add as_ngg variant for TES as ES to select Wave32/64
Legacy GS has to use Wave64, so TES before GS has to use Wave64 too.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák 88efb63caf radeonsi/gfx10: implement Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák 7f0ada3f3e radeonsi/gfx10: set GE_CTNL.PACKET_TO_ONE_PA for NGG
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Samuel Pitoiset e510c5ee3b ac: import ac_get_compute_resource_limits() from RadeonSI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Marek Olšák d7e80ba1e7 radeonsi: set FLUSH_ON_BINNING_TRANSITION when needed
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák 9dbe63ceea radeonsi/gfx10: use the new scan converter when binning is disabled
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák 5b50fb9b7f radeonsi/gfx10: no need to invalidate L2 for framebuffer -> texture coherency
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák f66ee5af2f radeonsi: determine the rasterization primitive type accurately (v2)
v2: reworked version to fix bugs and make it more efficient

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák b680f723f8 radeonsi/gfx10: export correct PrimitiveID from NGG vertex shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák 07aacdbfd5 radeonsi/gfx10: add a workaround for stencil HTILE with mipmapping
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák 6eb219e963 radeonsi/gfx10: fix intensity formats
move the ALPHA_IS_ON_MSB fixup into vi_alpha_is_on_msb

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák 6944f99176 radeonsi/gfx10: allocate GDS BOs for streamout
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle 792a638b03 radeonsi/gfx10: implement streamout-related queries
The NGG hardware pipeline doesn't track these statistics automatically,
and in fact *cannot* track them automatically when API geometry shaders
are involved, so we accumulate statistics in the shader using atomic
adds.

This implementation accumulates statistics via the memory system and
the RW buffer descriptor setup. We could use GDS, but since these
atomics aren't latency-sensitive, that basically just trades off
L2$ bandwidth vs. export bus bandwidth. One single memory transaction
per shader workgroup doesn't seem too bad. The result ring buffer in
memory is needed either way to avoid pipeline stalls.

The shader code contains the atomic unconditionally, though the
GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The
atomic is simply discarded by the shader hardware in that case.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle 5726ec0d24 radeonsi/gfx10: implement si_build_vgt_shader_config
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle b45c3debe8 radeonsi/gfx10: keep track of whether NGG is used
We always use NGG by default, except when tessellation is enabled with
extreme geometry shader amplification.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 7bb9bb0540 radeonsi/gfx10: implement gfx10_emit_cache_flush
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 0c6c6810bd radeonsi/gfx10: add si_context::emit_cache_flush
The introduction of GCR_CNTL makes cache flush handling on gfx10
sufficiently different that it makes sense to just use a separate
function.

Since emit_cache_flush is called quite early during context init,
we initialize the pointer explicitly in si_create_context.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 08e2a62b07 radeonsi/gfx10: implement DB registers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 372652bccc radeonsi/gfx10: set CB registers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 595a7f7c47 radeonsi/gfx10: add pipe_screen::make_texture_descriptor
Texture descriptors in gfx10 are very different.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Marek Olšák c53e6ea05d radeonsi: use a fragment shader blit instead of DB->CB copy for ZS CPU mappings
This mainly removes and simplifies code that is no longer needed.

There were some issues with the DB->CB stencil copy on gfx10, so let's
just use a fragment shader blit for all ZS mappings. It's more reliable.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-07-03 15:51:12 -04:00
Marek Olšák 1d6e358c36 radeonsi: rename and re-document cache flush flags
SMEM and VMEM caches are L0 on gfx10.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Nicolai Hähnle 610e1a81f7 radeonsi: refactor si_update_vgt_shader_config
We'll have to extend this at some point, and using a bitfield union in
this way makes it easier to get the right index without excessive
branching.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Nicolai Hähnle bf8a1ca902 radeonsi: use the new run-time linker for shaders
v2:
- fix a memory leak

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Marek Olšák b5697c311b Change a few frequented uses of DEBUG to !NDEBUG
debugoptimized builds don't define NDEBUG, but they also don't define
DEBUG. We want to enable cheap debug code for these builds.
I only chose those occurences that I care about.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-29 21:13:35 -04:00
Marek Olšák 894e017c9c r600+radeonsi: use ctx_query_reset_status on radeon
This allows a nice cleanup, because the winsys always handles it.
2019-05-16 13:15:36 -04:00
Marek Olšák 78e35df52a radeonsi: update buffer descriptors in all contexts after buffer invalidation
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108824

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
2019-05-16 13:15:36 -04:00
Marek Olšák 9f505ce21d radeonsi: disable primitive restart for triangles for DiRT Rally
It may decrease performance and it prevents compute-based primitive culling.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:36 -04:00
Marek Olšák 0252fb92b8 radeonsi: add primitive culling stats to the HUD
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:36 -04:00
Marek Olšák c9b7a37b8f radeonsi: cull primitives with async compute for large draw calls
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:34 -04:00
Marek Olšák 07c83d25fd radeonsi: add a cs parameter into si_cp_copy_data
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:57 -04:00
Marek Olšák ce264d19a0 radeonsi: add a cs parameter into si_cp_release_mem
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:56 -04:00
Marek Olšák 9624855f13 radeonsi: add threadgroups_per_cu param into si_get_compute_resource_limits
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:54 -04:00
Marek Olšák 49a016ec5d radeonsi: make si_initialize_compute reusable
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:51 -04:00
Marek Olšák c44c6951d4 radeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helper
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:49 -04:00
Marek Olšák ccfcb9d818 ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>

We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.

It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
2019-05-15 20:54:10 -04:00
Nicolai Hähnle d814c21b1b radeonsi: overhaul the vertex fetch fixup mechanism
The overall goal is to support unaligned loads from vertex buffers
natively on SI.

In the unaligned case, we fall back to the general case implementation in
ac_build_opencoded_load_format. Since this function is fully general,
we will also use it going forward for cases requiring fully manual format
conversions of dwords anyway.

This requires a different encoding of the fix_fetch array, which will now
contain the entire format information if a fixup is required.

Having to check the alignment of vertex buffers is awkward. To keep the
impact on the fast path minimal, the si_context will keep track of which
vertex buffers are (not) at least dword-aligned, while the
si_vertex_elements will note which vertex buffers have some (at most dword)
alignment requirement. Vertex buffers should be dword-aligned most of the
time, which allows a fast early-out in almost all cases.

Add the radeonsi_vs_fetch_always_opencode configuration variable for
testing purposes. Note that it can only be used reliably on LLVM >= 9,
because support for byte and short load is required.

v2:
- add a missing check to si_bind_vertex_elements

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 17:07:23 +02:00
Marek Olšák 383f406591 radeonsi: remove dirty slot masks from scissor and viewport states
All registers in the array need to be updated if any of them is changed.

Only apps writing gl_ViewportIndex were affected by this bug.
2019-04-25 11:49:38 -04:00
Marek Olšák 440135e5a0 radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)
Needed to track context rolls caused by streamout and ACQUIRE_MEM.
ACQUIRE_MEM can occur outside of draw calls.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355

v2: squashed patches and done more rework

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-04-25 11:49:38 -04:00
Nicolai Hähnle 8bef4df196 radeonsi: add si_debug_options for convenient adding/removing of options
Move the definition of radeonsi_clear_db_cache_before_clear there,
as well as radeonsi_enable_nir.

This removes the AMD_DEBUG=nir option.

We currently still have two places for options: the driconf machinery
and AMD_DEBUG/R600_DEBUG. If we are to have a single place for options,
then the driconf machinery should be preferred since it's more flexible.

The only downside of the driconf machinery was that adding new options
was quite inconvenient. With this change, a simple boolean option can
be added with a single line of code, same as for AMD_DEBUG.

One technical limitation of this particular implementation is that while
almost all driconf features are available, the translation machinery doesn't
pick up the description strings for options added in si_debvug_options. In
practice, translations haven't been provided anyway, and this is intended
for developer options, so I'm not too worried. It could always be added
later if anybody really cares.

v2:
- use bool instead of uint8_t for options
- si_debug_options.inc -> si_debug_options.h

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-25 12:31:02 +02:00
Marek Olšák 951d60f8cd radeonsi: delay adding BOs at the beginning of IBs until the first draw
so that bound compute shader resources won't be added when they are not
needed and same for graphics.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:36 -04:00
Marek Olšák 09bb8c8557 radeonsi: add helper si_get_minimum_num_gfx_cs_dwords
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:34 -04:00
Marek Olšák c59d238bb0 radeonsi: add si_cp_copy_data
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:33 -04:00
Marek Olšák b58e5fb6f3 radeonsi: use CP DMA for the null const buffer clear on CIK
This is a workaround for a thread deadlock that I have no idea
why it occurs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-22 16:05:52 -04:00
Marek Olšák 1f21396431 radeonsi: add support for displayable DCC for multi-RB chips
A compute shader is used to reorder DCC data from aligned to unaligned.
2019-04-04 09:53:24 -04:00
Marek Olšák 029bfa3d25 radeonsi: add ability to bind images as image buffers
so that we can bind DCC (texture) as an image buffer.
2019-04-04 09:53:24 -04:00
Marek Olšák fe3bfd7971 radeonsi/gfx9: add support for PIPE_ALIGNED=0
Needed by displayable DCC.

We need to flush L2 after rendering if PIPE_ALIGNED=0 and DCC is enabled.
2019-04-04 09:53:24 -04:00
Marek Olšák b9e02fe138 gallium: add pipe_grid_info::last_block
The OpenMAX state tracker will use this.

RadeonSI is adapted to use pipe_grid_info::last_block instead of its
internal state.

Acked-by: Leo Liu <leo.liu@amd.com>
2019-03-15 11:53:08 -04:00
Marek Olšák a1378639ab radeonsi: always use compute rings for clover on CI and newer (v2)
initialize all non-compute context functions to NULL.

v2: fix SI
2019-02-26 14:58:55 -05:00
Marek Olšák edbd2c1ff5 radeonsi: use SDMA for uploading data through const_uploader
v2: use tc.stream_uploader in si buffer_transfer_map if not called from
    the driver thread

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-02-20 21:04:29 -05:00
Marek Olšák 5068dec5de radeonsi: clear allocator_zeroed_memory with SDMA
so that it can be used in parallel IBs.

This also removes the SO_FILLED_SIZE hack.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Marek Olšák 7d4c935654 radeonsi: initialize textures using DCC to black when possible
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Marek Olšák a03ecbaeec radeonsi: handle render_condition_enable in si_compute_clear_render_target 2019-02-04 18:46:25 -05:00
Sonny Jiang 984fd73515 radeonsi: use compute for clear_render_target when possible
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-02-04 18:46:25 -05:00
Marek Olšák 260ff57647 radeonsi: rename rbo, rbuffer to buf or buffer
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 13:34:01 -05:00
Marek Olšák 501ff90a95 radeonsi: rename r600_resource -> si_resource
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 13:32:18 -05:00
Marek Olšák 1cfbed7587 radeonsi: remove r600 from comments
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 12:26:45 -05:00
Sonny Jiang 1b25d340b7 radeonsi: use compute for resource_copy_region when possible
v2: marek: fix snorm8 blits

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-01-22 12:24:35 -05:00
Jiang, Sonny 8daf5bb209 radeonsi: add compute_last_block to configure the partial block fields 2019-01-22 12:22:46 -05:00
Marek Olšák 4d5f8f39f3 radeonsi: move PKT3_WRITE_DATA generation into a helper function
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 12:14:26 -05:00
Marek Olšák 54bc87469a radeonsi: make si_cp_wait_mem more configurable
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:54 -05:00
Marek Olšák d28e208213 radeonsi: don't emit redundant PKT3_NUM_INSTANCES packets
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:50 -05:00
Nicolai Hähnle e2b9329f17 radeonsi: move remaining perfcounter code into si_perfcounter.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:57 +01:00
Nicolai Hähnle 5c841a1b1e radeonsi: rename SI_RESOURCE_FLAG_FORCE_TILING to clarify its purpose
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:39 +01:00
Marek Olšák 075fd5d8f2 radeonsi: add memory management stress tests for GDS
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-28 20:20:27 -05:00
Marek Olšák d7a4fa91f0 radeonsi: allow si_cp_dma_clear_buffer to clear GDS from any IB
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-28 20:20:27 -05:00
Marek Olšák 9dc776f3f2 radeonsi: don't set the CB clear color registers for 0/1 clear colors on Raven2
and add has_dcc_constant_encode.
2018-11-09 14:55:04 -05:00
Marek Olšák 99835fff08 radeonsi/gfx9: set optimal OVERWRITE_COMBINER_WATERMARK 2018-10-30 16:03:02 -04:00
Marek Olšák 77bcbe712e radeonsi: clamp point size to the limit
This fixes dEQP-GLES2.functional.rasterization.limits.points.
Broken by: ea039f789d

Tested-by: Jakob Bornecrantz <jakob@collabora.com>
2018-10-18 16:08:56 -04:00
Marek Olšák fcc70e4855 radeonsi: track context rolls better for the Vega scissor bug workaround
We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.
2018-10-16 17:23:25 -04:00
Marek Olšák 9b331e462e radeonsi: use compute shaders for clear_buffer & copy_buffer
Fast color clears should be much faster. Also, fast color clears on
evicted buffers should be 200x faster on GFX8 and older.
2018-10-16 17:23:25 -04:00
Marek Olšák ea039f789d radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports 2018-10-16 15:28:22 -04:00
Marek Olšák 41a6c3de1f radeonsi: don't re-upload the sample position constant buffer repeatedly 2018-10-16 15:28:22 -04:00
Marek Olšák fedc1fda30 radeonsi: save raster config in screen, add se_tile_repeat 2018-10-16 15:28:22 -04:00
Marek Olšák 67f02cf810 radeonsi: add GDS support to CP DMA 2018-10-16 15:28:22 -04:00
Marek Olšák 0d05581578 radeonsi: rename si_gfx_* functions to si_cp_*
and write_event_eop -> release_mem
2018-10-16 15:28:22 -04:00
Marek Olšák 6e1cf6532d radeonsi: make si_gfx_write_event_eop more configurable 2018-10-16 15:28:22 -04:00
Marek Olšák 203ef19f48 radeonsi: split si_copy_buffer
compute and SDMA will be added into it.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák 1119fe5c25 radeonsi: merge SI and CI dma_clear_buffer and remove the callback
also use assertions for the requirements that offset and size are a multiple
of 4.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák 93b8b987d0 radeonsi: add a thorough clear/copy_buffer benchmark 2018-08-29 15:31:42 -04:00
Marek Olšák 5914f5bd4a radeonsi: let internal compute dispatches tune WAVES_PER_SH 2018-08-29 15:31:42 -04:00
Marek Olšák c5442c1165 radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI 2018-08-29 15:31:42 -04:00
Marek Olšák c359880d8b radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA for measuring SDMA performance 2018-08-29 15:31:42 -04:00
Marek Olšák 0c5429cc73 radeonsi: add flag L2_STREAM for minimal cache usage 2018-08-29 15:31:41 -04:00
Marek Olšák df50099834 radeonsi: use radeon_info::name
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-14 21:20:31 -04:00
Marek Olšák de8d5edbc4 radeonsi: split si_clear_buffer to remove enum si_method
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-14 21:21:12 -04:00
Marek Olšák 4de92f2abb radeonsi: replace CP_DMA_USE_L2 with enum si_cache_policy
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-14 21:21:10 -04:00
Marek Olšák ac72a6bd0b radeonsi: move internal TGSI shaders into si_shaderlib_tgsi.c
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-14 21:20:31 -04:00
Marek Olšák 0ca8294ece radeonsi: implement EXT_window_rectangles
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-14 21:19:02 -04:00
Marek Olšák 4bad50ded9 radeonsi: cosmetic changes 2018-08-04 03:10:30 -04:00
Darren Powell 726a48c94f radeonsi: add new R600_DEBUG test "testclearbufperf"
Signed-off-by: Darren Powell <darren.powell@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-08-02 16:09:22 -04:00
Marek Olšák 20dd75a926 radeonsi: use storage_samples instead of color_samples in most places
and use pipe_resource::nr_storage_samples instead of
r600_texture::num_color_samples.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-07-31 18:28:41 -04:00
Tom Stellard 0866edede0 radeonsi: Add debug option to enable LLVM GlobalISel (v2)
R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than
SelectionDAG for instruction selection.

v2: mareko: move the helper to src/amd/common

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
2018-07-23 20:23:48 -04:00
Dave Airlie 0eb65b4944 radeonsi: rename si_compiler -> ac_llvm_compiler
As precursor to moving init to common code, just rename the struct
and move it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:31:32 +10:00
Marek Olšák bd963f8430 radeonsi: rename r600_transfer -> si_transfer
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák d4755ef389 radeonsi: remove redundant si_texture::cmask_size
cmask_buffer and surface.cmask_size can replace its role.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák 2a8d1039b6 radeonsi: inline struct r600_cmask_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák 166250f4e5 radeonsi: move CMASK size computation into ac_surface
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák 2d64a68c6f radeonsi: rename r600_surface -> si_surface
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák 218e133695 radeonsi: rename r600_memory_object -> si_memory_object
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák e5df04f13d radeonsi: remove unused r600_memory_object::offset
The real offset is passed through resource_from_memobj.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák 7bd40dc2f2 radeonsi: clean up some #includes
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Grazvydas Ignotas f966929805 radeonsi: add a debug flag to zero vram allocations
This allows to avoid having to see garbage in Dying Light loading screen
at least, which probably expects Windows/NV behavior of all allocations
being zeroed by default.

Analogous to radv flag with the same name.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-06-21 12:18:50 +03:00
Marek Olšák 1ba87f4438 radeonsi: rename r600_texture -> si_texture, rxxx -> xxx or sxxx
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-19 13:08:50 -04:00
Marek Olšák 6703fec58c amd,radeonsi: rename radeon_winsys_cs -> radeon_cmdbuf
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-19 13:08:50 -04:00
Marek Olšák dfeb61c5cf radeonsi: ignore PIPE_RESOURCE_FLAG_MAP_COHERENT
We treat coherent and non-coherent buffers the same.

And move external_usage for better packing.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-19 12:52:28 -04:00
Marek Olšák f3b3ee6974 radeonsi: micro-optimize prim checking and fix guardband with lines+adjacency
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:34 -04:00
Marek Olšák 73b0d10152 radeonsi: don't set VGT_LS_HS_CONFIG if it doesn't change
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:25 -04:00
Marek Olšák 28ee825e19 radeonsi: move VGT_GS_OUT_PRIM_TYPE into si_shader_gs
same as amdvlk.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:23 -04:00
Sonny Jiang 43b0269ce3 radeonsi: emit_db_render_state packets optimization
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-06-07 23:26:25 -04:00
Timothy Arceri 03c370d2f1 radeonsi: fix possible truncation on renderer string
Fixes truncation warning in gcc 8.1

Fixes: 8539c9bf31 ("gallium/radeon: add the kernel version into the renderer string")
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2018-06-08 10:07:55 +10:00
Marek Olšák b936f9aa32 radeonsi: disable primitive binning for all blitter ops
same as amdvlk.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:56 -04:00
Marek Olšák a969f184cf radeonsi: add an environment variable that forces EQAA for MSAA allocations
This is for testing and experiments.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:34:37 -04:00
Marek Olšák 7ac4ef097d radeonsi: add EQAA SC,DB,CB register programming
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:34:34 -04:00
Marek Olšák 9d00580e75 radeonsi: support creating EQAA color textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:34:32 -04:00
Marek Olšák 835095973d radeonsi: remove r600_fmask_info
radeon_surf contains almost everything.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák 8b7358fe43 radeonsi: increase the number of compiler threads depending on the CPU
The compiler queue was limited to 3 threads, so shader-db running
on a 16-thread CPU would have a bottleneck on the 3-thread queue.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 797d673c9a radeonsi: move passmgr into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 87eb597758 radeonsi: add struct si_compiler containing LLVMTargetMachineRef
It will contain more variables.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 788d66553a radeonsi: rename r600_texture::resource to buffer
r600_resource could be renamed to si_buffer.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 6fadfc01c6 radeonsi: use r600_resource() typecast helper
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák de344209ad radeonsi: inline 2 trivial state structures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák ccebcba893 radeonsi: remove si_atom::id
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 639b673fc3 radeonsi: don't use an indirect table for state atoms
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 9054799b39 radeonsi: rename r600_atom -> si_atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák a8abbbb172 radeonsi: remove r600_pipe_common.h
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák c732d069b3 radeonsi: implement DCC fast clear swizzle constraints more accurately
Reduce swizzle constraints to the ALPHA_IS_ON_MSB constraint and the clear
value of 1.

This significantly changes the DCC fast clear code, and fixes fast clear
for RGB formats without alpha.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 1cc2e0cc6b radeonsi: fully enable 2x DCC MSAA for array and non-array textures
The clear code is exactly the same as for 1 sample buffers -
just clear the whole thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 60299e9abe radeonsi: don't emit partial flushes for internal CS flushes only
Tested-by: Benedikt Schemmer <ben@besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-16 16:58:10 -04:00
Marek Olšák 1b3199d14d radeonsi: implement mechanism for IBs without partial flushes at the end (v6)
(This patch doesn't enable the behavior. It will be enabled in a later
commit.)

Draw calls from multiple IBs can be executed in parallel.

v2: do emit partial flushes on SI
v3: invalidate all shader caches at the beginning of IBs
v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
    only do this for flushes invoked internally
v5: empty IBs should wait for idle if the flush requires it
v6: split the commit

If we artificially limit the number of draw calls per IB to 5, we'll get
a lot more IBs, leading to a lot more partial flushes. Let's see how
the removal of partial flushes changes GPU utilization in that scenario:

With partial flushes (time busy):
    CP: 99%
    SPI: 86%
    CB: 73:

Without partial flushes (time busy):
    CP: 99%
    SPI: 93%
    CB: 81%

Tested-by: Benedikt Schemmer <ben@besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-16 16:58:10 -04:00
Marek Olšák 918b798668 radeonsi: make sure CP DMA is idle at the end of IBs 2018-04-13 14:07:20 -04:00
Marek Olšák 9a1363427e radeonsi: always prefetch later shaders after the draw packet
so that the draw is started as soon as possible.

v2: only prefetch the API VS and VBO descriptors

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-13 12:31:04 -04:00
Bas Vermeulen be628e4749 radeonsi: correct si_vgt_param_key on big endian machines
Using mesa OpenCL failed on a big endian PowerPC machine because
si_vgt_param_key is using bitfields and a 32 bit int for an
index into an array.

Fix si_vgt_param_key to work correctly on both little endian
and big endian machines.

Signed-off-by: Bas Vermeulen <bas@daedalean.ai>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-04-09 13:42:30 -04:00
Marek Olšák c7dd59b06d radeonsi: fix a crash if ps_shader.cso is NULL in si_get_total_colormask 2018-04-05 15:53:52 -04:00
Marek Olšák 6a93441295 radeonsi: remove r600_common_context
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00