Commit Graph

263 Commits

Author SHA1 Message Date
Marek Olšák 58d062b87d radeonsi: de-atomize L2 prefetch
I'd like to be able to move the prefetch call site around.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák 1aeafb59e6 radeonsi: print CE IBs into ddebug reports
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-01 17:06:38 +02:00
Marek Olšák f4d095cc65 radeonsi: update dirty_level_mask only when flushing or unbinding framebuffer
This fixes corruption with bindless textures in Dawn Of War 3.

The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask
was only updated after the first draw call. The problem is bindless textures
are checked for decompression every draw call and we would only decompress
after the first draw call. The solution is to set dirtiness after the last
draw call to the framebuffer, so the (unconditional) decompression of
bindless textures happens at the right time.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-28 16:34:24 +02:00
Marek Olšák 28c7fbbe0f radeonsi: rely on CLEAR_STATE for clearing UCP and blend color registers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-28 08:03:24 +02:00
Marek Olšák ed2b3f5c81 radeonsi: decrease the number of compiler threads
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26 19:53:26 +02:00
Marek Olšák facfab28fe radeonsi/gfx9: add workarounds to avoid VGPR indexing completely
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.

In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák 79bd1d4f8b radeonsi/gfx9: keep reusing the same buffer/address for the gfx9 flush fence
instead of using a monotonic suballocator

v2: initialize the memory at context creation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák 2263610827 radeonsi: flush DB caches only when transitioning from DB to texturing
Use the mechanism of si_decompress_textures, but instead of doing
the actual decompression, just flag the DB cache flush there.

This removes a lot of unnecessary DB cache flushes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Samuel Pitoiset f00e80e3f7 radeonsi: keep track of the sampler state for texture handles
Needed for updating all resident texture descriptors when
dirty_tex_counter changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 10:14:52 +02:00
Samuel Pitoiset 6ff6863c32 radeonsi: reduce overhead for resident textures which need color decompression
This is done by introducing a separate list.

si_decompress_textures() is now 5x faster.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:38 +02:00
Samuel Pitoiset 06ed251c32 radeonsi: reduce overhead for resident textures which need depth decompression
This is done by introducing a separate list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:36 +02:00
Samuel Pitoiset 811756dfd0 radeonsi: upload new descriptors when resident buffers are invalidated
When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.

Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 2c3a7d5840 radeonsi: track use of bindless samplers/images from tgsi_shader_info
This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset e1813a8635 radeonsi: decompress resident textures/images before graphics/compute
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 9cc328eef6 radeonsi: implement ARB_bindless_texture
This implements the Gallium interface. Decompression of resident
textures/images will follow in the next patches.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 77bbdcdfcd radeonsi: add a slab allocator for bindless descriptors
For each texture/image handles, we need to allocate a new
buffer for the bindless descriptor. But when the number of
buffers added to the current CS becomes high, the overhead
in the winsys (and in the kernel) is important.

To reduce this bottleneck, the idea is to suballocate the
bindless descriptors using a slab similar to the one used
in the winsys.

Currently, a buffer can hold 1024 bindless descriptors but
this limit is arbitrary and could be changed in the future
for some reasons. Once a slab is allocated the "base" buffer
is added to a per-context list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Marek Olšák 4951b0adbd radeonsi: pack si_context better
there isn't much to gain here

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 6d43d352cc radeonsi: pack si_framebuffer better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák ca815f1ead radeonsi: pack si_sampler_view better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák c8b6f42e25 radeonsi: rename si_vertex_element -> si_vertex_elements
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák c503381864 radeonsi: get rid of more compressed_colortex_mask names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák d8a577d96e radeonsi: rename shader resource decompress masks to their true meaning
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák 391673af7a radeonsi: disable the patch ID workaround on SI when the patch ID isn't used (v2)
The workaround causes a massive performance decrease on 1-SE parts.
(Cape Verde, Hainan, Oland)

The performance regression is already part of 17.0 and 17.1.

v2: check tess_uses_prim_id

Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák 2b7fd9df9a radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:18 +02:00
Marek Olšák 140b3c5019 radeonsi: add a new helper si_get_vs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:16 +02:00
Samuel Pitoiset 878bd981bf radeonsi: isolate real framebuffer changes from the decompression passes (v3)
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.

v2: Marek - set the flags outside the loop
          - also clear and set framebuffer.do_update_surf_dirtiness there
          - do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:14 +02:00
Marek Olšák 7d67cbefe0 radeonsi: clean up decompress blend state names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:45 +02:00
Marek Olšák 86cc809726 radeonsi: use a compiler queue with a low priority for optimized shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák edb59ef2dc radeonsi: do only 1 big CE dump at end of IBs and one reload in the preamble
A later commit will only upload descriptors used by shaders, so we won't do
full dumps anymore, so the only way to have a complete mirror of CE RAM
in memory is to do a separate dump after the last draw call.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák 5df24c3fa6 radeonsi: merge constant and shader buffers descriptor lists into one
Constant buffers: slot[16], .. slot[31] (ascending)
Shader buffers: slot[15], .. slot[0] (descending)

The idea is that if we have 4 constant buffers and 2 shader buffers, we only
have to upload 6 slots. That optimization is left for a later commit.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Nicolai Hähnle 4ea67c1751 radeonsi: rename tcs_tes_uses_prim_id for clarity
What we care about is whether PrimID is used while tessellation is
enabled; whether it's used in TCS/TES or further down the pipeline is
irrelevant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:54 +02:00
Marek Olšák 330d0607ed gallium: remove pipe_index_buffer and set_index_buffer
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.

Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.

pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.

v2: fixes for nine, svga
2017-05-10 19:00:16 +02:00
Marek Olšák 9fd9a7d0ba radeonsi: remove VS epilog code, compile VS with PrimID export on demand
The use of PrimID in the pixel shader is too rare to deserve such
a sizable support code.

The initial idea of the VS epilog was to move the clipping code there and
remove it based on states, but optimized variants are now used to do that
and are easier to support, so the VS epilog has turned out to be not so
useful.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák 0588146cb0 radeonsi/gfx9: set up shader registers for merged LS-HS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Nicolai Hähnle ff39f0d59c radeonsi: emit VS_STATE register explicitly from si_draw_vbo
We will merge other derived state information into this register.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 17:30:18 +02:00
Constantine Kharlamov fa8bc90990 r600g/radeonsi: use the correct types (taken from pipe_draw_info)
Note: si_shader.h has also "type" variable that should be changed to
"enum pipe_prim_type", however it triggers a bunch of warnings about
unhandled switches, so due not knowing the correct way to handle them, I
decided to leave it as is.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-04-04 22:15:47 +02:00
Marek Olšák 829bd77235 radeonsi: adjust checking for SC bug workarounds
no change in behavior, just making sure that no later chips will use
the workarounds

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:56 +02:00
Marek Olšák 7d2fa8dc10 radeonsi: decompress DCC in set_sampler_view instead of create_sampler_view (v2)
v2: don't add a new decompress helper function

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 20:57:53 +02:00
Marek Olšák 86f13c7363 radeonsi/gfx9: emit FLUSH_DFSM where required
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák ba2e7c68ce gallium/radeon: move pre-GFX9 radeon_surf.* members to radeon_surf.u.legacy.*
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Timothy Arceri 2efddc63ee gallium/util: replace pipe_mutex with mtx_t
pipe_mutex was made unnecessary with fd33a6bcd7.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:48:11 +11:00
Timothy Arceri 69a687189e radeon/ac: switch from radeon_shader_binary to ac_shader_binary
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Nicolai Hähnle 066a117be7 radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIK
The same PS epilog workaround as for 8-bit integer formats is required,
since the CB doesn't do clamping.

Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-21 10:45:13 +01:00
Marek Olšák 6b73aafceb radeonsi: use a clever alignment for constant buffer uploads
This results in a very tiny decrease in lgkm wait cycles.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák 1a392a4377 radeonsi: move CP_DMA_ALIGNMENT definition
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 4c288c73ea radeonsi: remove SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER
not necessary

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 65df38b191 radeonsi: remove separate CB/DB_META flush flags
not used separately

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 408f9a1584 radeonsi: atomize the scratch buffer state
The update frequency is very low.

Difference: Only account for the size when allocating a new one and when
            starting a new IB, and check for NULL. (v3)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Marek Olšák 5f99c49008 radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák c78177fc64 radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00