Commit Graph

457 Commits

Author SHA1 Message Date
Nicolai Hähnle 610e1a81f7 radeonsi: refactor si_update_vgt_shader_config
We'll have to extend this at some point, and using a bitfield union in
this way makes it easier to get the right index without excessive
branching.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Nicolai Hähnle bf8a1ca902 radeonsi: use the new run-time linker for shaders
v2:
- fix a memory leak

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Marek Olšák b5697c311b Change a few frequented uses of DEBUG to !NDEBUG
debugoptimized builds don't define NDEBUG, but they also don't define
DEBUG. We want to enable cheap debug code for these builds.
I only chose those occurences that I care about.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-29 21:13:35 -04:00
Marek Olšák 894e017c9c r600+radeonsi: use ctx_query_reset_status on radeon
This allows a nice cleanup, because the winsys always handles it.
2019-05-16 13:15:36 -04:00
Marek Olšák 78e35df52a radeonsi: update buffer descriptors in all contexts after buffer invalidation
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108824

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
2019-05-16 13:15:36 -04:00
Marek Olšák 9f505ce21d radeonsi: disable primitive restart for triangles for DiRT Rally
It may decrease performance and it prevents compute-based primitive culling.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:36 -04:00
Marek Olšák 0252fb92b8 radeonsi: add primitive culling stats to the HUD
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:36 -04:00
Marek Olšák c9b7a37b8f radeonsi: cull primitives with async compute for large draw calls
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:34 -04:00
Marek Olšák 07c83d25fd radeonsi: add a cs parameter into si_cp_copy_data
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:57 -04:00
Marek Olšák ce264d19a0 radeonsi: add a cs parameter into si_cp_release_mem
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:56 -04:00
Marek Olšák 9624855f13 radeonsi: add threadgroups_per_cu param into si_get_compute_resource_limits
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:54 -04:00
Marek Olšák 49a016ec5d radeonsi: make si_initialize_compute reusable
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:51 -04:00
Marek Olšák c44c6951d4 radeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helper
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:49 -04:00
Marek Olšák ccfcb9d818 ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>

We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.

It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
2019-05-15 20:54:10 -04:00
Nicolai Hähnle d814c21b1b radeonsi: overhaul the vertex fetch fixup mechanism
The overall goal is to support unaligned loads from vertex buffers
natively on SI.

In the unaligned case, we fall back to the general case implementation in
ac_build_opencoded_load_format. Since this function is fully general,
we will also use it going forward for cases requiring fully manual format
conversions of dwords anyway.

This requires a different encoding of the fix_fetch array, which will now
contain the entire format information if a fixup is required.

Having to check the alignment of vertex buffers is awkward. To keep the
impact on the fast path minimal, the si_context will keep track of which
vertex buffers are (not) at least dword-aligned, while the
si_vertex_elements will note which vertex buffers have some (at most dword)
alignment requirement. Vertex buffers should be dword-aligned most of the
time, which allows a fast early-out in almost all cases.

Add the radeonsi_vs_fetch_always_opencode configuration variable for
testing purposes. Note that it can only be used reliably on LLVM >= 9,
because support for byte and short load is required.

v2:
- add a missing check to si_bind_vertex_elements

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 17:07:23 +02:00
Marek Olšák 383f406591 radeonsi: remove dirty slot masks from scissor and viewport states
All registers in the array need to be updated if any of them is changed.

Only apps writing gl_ViewportIndex were affected by this bug.
2019-04-25 11:49:38 -04:00
Marek Olšák 440135e5a0 radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)
Needed to track context rolls caused by streamout and ACQUIRE_MEM.
ACQUIRE_MEM can occur outside of draw calls.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355

v2: squashed patches and done more rework

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-04-25 11:49:38 -04:00
Nicolai Hähnle 8bef4df196 radeonsi: add si_debug_options for convenient adding/removing of options
Move the definition of radeonsi_clear_db_cache_before_clear there,
as well as radeonsi_enable_nir.

This removes the AMD_DEBUG=nir option.

We currently still have two places for options: the driconf machinery
and AMD_DEBUG/R600_DEBUG. If we are to have a single place for options,
then the driconf machinery should be preferred since it's more flexible.

The only downside of the driconf machinery was that adding new options
was quite inconvenient. With this change, a simple boolean option can
be added with a single line of code, same as for AMD_DEBUG.

One technical limitation of this particular implementation is that while
almost all driconf features are available, the translation machinery doesn't
pick up the description strings for options added in si_debvug_options. In
practice, translations haven't been provided anyway, and this is intended
for developer options, so I'm not too worried. It could always be added
later if anybody really cares.

v2:
- use bool instead of uint8_t for options
- si_debug_options.inc -> si_debug_options.h

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-25 12:31:02 +02:00
Marek Olšák 951d60f8cd radeonsi: delay adding BOs at the beginning of IBs until the first draw
so that bound compute shader resources won't be added when they are not
needed and same for graphics.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:36 -04:00
Marek Olšák 09bb8c8557 radeonsi: add helper si_get_minimum_num_gfx_cs_dwords
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:34 -04:00
Marek Olšák c59d238bb0 radeonsi: add si_cp_copy_data
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:33 -04:00
Marek Olšák b58e5fb6f3 radeonsi: use CP DMA for the null const buffer clear on CIK
This is a workaround for a thread deadlock that I have no idea
why it occurs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-22 16:05:52 -04:00
Marek Olšák 1f21396431 radeonsi: add support for displayable DCC for multi-RB chips
A compute shader is used to reorder DCC data from aligned to unaligned.
2019-04-04 09:53:24 -04:00
Marek Olšák 029bfa3d25 radeonsi: add ability to bind images as image buffers
so that we can bind DCC (texture) as an image buffer.
2019-04-04 09:53:24 -04:00
Marek Olšák fe3bfd7971 radeonsi/gfx9: add support for PIPE_ALIGNED=0
Needed by displayable DCC.

We need to flush L2 after rendering if PIPE_ALIGNED=0 and DCC is enabled.
2019-04-04 09:53:24 -04:00
Marek Olšák b9e02fe138 gallium: add pipe_grid_info::last_block
The OpenMAX state tracker will use this.

RadeonSI is adapted to use pipe_grid_info::last_block instead of its
internal state.

Acked-by: Leo Liu <leo.liu@amd.com>
2019-03-15 11:53:08 -04:00
Marek Olšák a1378639ab radeonsi: always use compute rings for clover on CI and newer (v2)
initialize all non-compute context functions to NULL.

v2: fix SI
2019-02-26 14:58:55 -05:00
Marek Olšák edbd2c1ff5 radeonsi: use SDMA for uploading data through const_uploader
v2: use tc.stream_uploader in si buffer_transfer_map if not called from
    the driver thread

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-02-20 21:04:29 -05:00
Marek Olšák 5068dec5de radeonsi: clear allocator_zeroed_memory with SDMA
so that it can be used in parallel IBs.

This also removes the SO_FILLED_SIZE hack.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Marek Olšák 7d4c935654 radeonsi: initialize textures using DCC to black when possible
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Marek Olšák a03ecbaeec radeonsi: handle render_condition_enable in si_compute_clear_render_target 2019-02-04 18:46:25 -05:00
Sonny Jiang 984fd73515 radeonsi: use compute for clear_render_target when possible
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-02-04 18:46:25 -05:00
Marek Olšák 260ff57647 radeonsi: rename rbo, rbuffer to buf or buffer
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 13:34:01 -05:00
Marek Olšák 501ff90a95 radeonsi: rename r600_resource -> si_resource
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 13:32:18 -05:00
Marek Olšák 1cfbed7587 radeonsi: remove r600 from comments
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 12:26:45 -05:00
Sonny Jiang 1b25d340b7 radeonsi: use compute for resource_copy_region when possible
v2: marek: fix snorm8 blits

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-01-22 12:24:35 -05:00
Jiang, Sonny 8daf5bb209 radeonsi: add compute_last_block to configure the partial block fields 2019-01-22 12:22:46 -05:00
Marek Olšák 4d5f8f39f3 radeonsi: move PKT3_WRITE_DATA generation into a helper function
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 12:14:26 -05:00
Marek Olšák 54bc87469a radeonsi: make si_cp_wait_mem more configurable
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:54 -05:00
Marek Olšák d28e208213 radeonsi: don't emit redundant PKT3_NUM_INSTANCES packets
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:50 -05:00
Nicolai Hähnle e2b9329f17 radeonsi: move remaining perfcounter code into si_perfcounter.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:57 +01:00
Nicolai Hähnle 5c841a1b1e radeonsi: rename SI_RESOURCE_FLAG_FORCE_TILING to clarify its purpose
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:39 +01:00
Marek Olšák 075fd5d8f2 radeonsi: add memory management stress tests for GDS
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-28 20:20:27 -05:00
Marek Olšák d7a4fa91f0 radeonsi: allow si_cp_dma_clear_buffer to clear GDS from any IB
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-28 20:20:27 -05:00
Marek Olšák 9dc776f3f2 radeonsi: don't set the CB clear color registers for 0/1 clear colors on Raven2
and add has_dcc_constant_encode.
2018-11-09 14:55:04 -05:00
Marek Olšák 99835fff08 radeonsi/gfx9: set optimal OVERWRITE_COMBINER_WATERMARK 2018-10-30 16:03:02 -04:00
Marek Olšák 77bcbe712e radeonsi: clamp point size to the limit
This fixes dEQP-GLES2.functional.rasterization.limits.points.
Broken by: ea039f789d

Tested-by: Jakob Bornecrantz <jakob@collabora.com>
2018-10-18 16:08:56 -04:00
Marek Olšák fcc70e4855 radeonsi: track context rolls better for the Vega scissor bug workaround
We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.
2018-10-16 17:23:25 -04:00
Marek Olšák 9b331e462e radeonsi: use compute shaders for clear_buffer & copy_buffer
Fast color clears should be much faster. Also, fast color clears on
evicted buffers should be 200x faster on GFX8 and older.
2018-10-16 17:23:25 -04:00
Marek Olšák ea039f789d radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports 2018-10-16 15:28:22 -04:00