Commit Graph

457 Commits

Author SHA1 Message Date
Marek Olšák b191e2d79d radeonsi: move r600_test_dma.c into si_test_dma.c
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-29 18:21:30 +01:00
Marek Olšák 7aa2366b70 radeonsi: move all clear() code into si_clear.c
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-29 18:21:30 +01:00
Nicolai Hähnle 1a6d9e087a radeonsi: record and dump time of flush
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:04 +01:00
Nicolai Hähnle 609a230375 gallium/u_threaded: implement asynchronous flushes
This requires out-of-band creation of fences, and will be signaled to
the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag.

v2:
- remove an incorrect assertion
- handle fence_server_sync for unsubmitted fences by
  relying on the improved cs_add_fence_dependency
- only implement asynchronous flushes on amdgpu

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:42 +01:00
Nicolai Hähnle 78a4750d91 radeonsi: move fence functions to si_fence.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:42 +01:00
Nicolai Hähnle dd7c273e87 radeonsi: move pipe debug callback to si_context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:19 +01:00
Marek Olšák 33000e7c43 radeonsi: add si_screen::has_ls_vgpr_init_bug
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-07 17:58:40 +01:00
Marek Olšák 529cdce799 radeonsi: remove 'Authors:' comments
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-02 18:19:03 +01:00
Samuel Pitoiset dd79aa4ad3 radeonsi: update hack for HTILE corruption in ARK: Survival Evolved
It appears that flushing the DB metadata is actually not sufficient
since the driver uses the new VS blit shaders. This looks quite
strange though, but it seems like we need to flush DB for fixing
the corruption.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102955
Fixes: 69ccb9dae7 (radeonsi: use new VS blit shaders (VS inputs in SGPRs)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-27 10:47:30 +02:00
Marek Olšák 0ecf9b90ef radeonsi: import cayman_msaa.c from drivers/radeon
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-09 16:27:04 +02:00
Marek Olšák 65f2e33500 radeonsi: import r600_streamout from drivers/radeon
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-09 16:26:55 +02:00
Marek Olšák 13b6c1c031 radeonsi: minor cleanup of si_update_vs_writes_viewport_index
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-07 18:26:35 +02:00
Marek Olšák 69ccb9dae7 radeonsi: use new VS blit shaders (VS inputs in SGPRs)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-07 18:26:35 +02:00
Marek Olšák 6a8401a94e radeonsi: add VS blit shader creation
no users yet

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-07 18:26:35 +02:00
Marek Olšák de810f8b84 radeonsi: remove wrappers si_decompress_xx_textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-07 18:26:35 +02:00
Marek Olšák c4d1a199f8 radeonsi: add a drirc workaround for HTILE corruption in ARK: Survival Evolved
v2: use DB_META | PS_PARTIAL_FLUSH

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102955
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2017-10-06 02:56:11 +02:00
Marek Olšák 15d918e46f radeonsi: inline struct si_sampler_views
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-06 02:56:11 +02:00
Marek Olšák 23cdde5138 radeonsi: rename si_textures_info -> si_samplers, si_images_info -> si_images
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-06 02:56:11 +02:00
Nicolai Hähnle 63680471f9 radeonsi: remove si_context::{scissor_enabled,clip_halfz}
They are just copies of the rasterizer state.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:45 +02:00
Nicolai Hähnle 12f3155e28 radeonsi: simplify the signature of si_update_vs_writes_viewport_index
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:45 +02:00
Nicolai Hähnle 7bbcb6ac6c radeonsi: move current_rast_prim into si_context
v2: rebase fixes

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:45 +02:00
Nicolai Hähnle 6b416ec3d6 radeonsi: move and rename scissor and viewport state and functions
v2: change GET_MAX_SCISSOR to SI_MAX_SCISSOR

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:45 +02:00
Nicolai Hähnle f86a112b07 radeonsi: move current_rast_prim to r600_common_context
We'll use it in the scissors / clip / guardband state.

v2: avoid a performance regression on r600 when applied to
    (pre-fork) stable branches

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:43 +02:00
Nicolai Hähnle 797dd12c7b radeonsi: fix border color translation for integer textures
This fixes the extremely unlikely case that an application uses
0x80000000 or 0x3f800000 as border color for an integer texture and
helps in the also, but perhaps slightly less, unlikely case that 1 is
used as a border color.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:45:08 +02:00
Nicolai Hähnle 4c56e07029 radeonsi: clamp depth comparison value only for fixed point formats
The hardware usually does this automatically. However, we upgrade
depth to Z32_FLOAT to enable TC-compatible HTILE, which means the
hardware no longer clamps the comparison value for us.

The only way to tell in the shader whether a clamp is required
seems to be to communicate an additional bit in the descriptor
table. While VI has some unused bits in the resource descriptor,
those bits have unfortunately all been used in gfx9. So we use
an unused bit in the sampler state instead.

Fixes dEQP-GLES3.functional.texture.shadow.2d.linear.equal_depth_component32f
and many other tests in dEQP-GLES3.functional.texture.shadow.*

Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:44:50 +02:00
Nicolai Hähnle 7a62f8621a radeonsi: allow out-of-order rasterization in commutative blending cases
We do not enable this by default for additive blending, since it slightly
breaks OpenGL invariance guarantees due to non-determinism.

Still, there may be some applications can benefit from white-listing
via the radeonsi_commutative_blend_add drirc setting without any real
visible artifacts.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-18 11:25:20 +02:00
Nicolai Hähnle 8c56c45cd4 radeonsi: add drirc option "radeonsi_assume_no_z_fights"
This option enables a performance optimization where typical non-blending
draws with depth buffer may be rasterized out-of-order (on VI+, multi-SE
chips).

This optimization can lead to incorrect results when an applications
renders multiple objects with the same Z value at the same pixel, so we
will never enable it by default. But there may be applications that could
benefit from white-listing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-18 11:25:19 +02:00
Nicolai Hähnle aab134cfa5 radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs
This does not take commutative blending into account yet.

R600_DEBUG=nooutoforder disables it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-18 11:25:19 +02:00
Nicolai Hähnle 6772452e4c amd/common: remove has_ds_bpermute argument from ac_build_ddxy
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-18 11:25:18 +02:00
Nicolai Hähnle 45c5c44451 radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug
When the HS wave is empty, the hardware writes the LS VGPRs starting at
v0 instead of v2. Workaround by shifting them back into place when
necessary. For simplicity, this is always done in the LS prolog.

According to the hardware team, this will be fixed in future chips,
so take that into account already.

Note that this is not a bug fix, as the bug was already worked
around by commit 166823bfd2 ("radeonsi/gfx9: add a temporary workaround
for a tessellation driver bug"). This change merely replaces the
workaround by one that should be better.

v2: add workaround code to shader only when necessary
v3: clarify the prefer_mono comment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06 10:02:49 +02:00
Nicolai Hähnle 34124e412f radeonsi/gfx9: always flush DB metadata on framebuffer changes
This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06 09:57:08 +02:00
Marek Olšák c3ebac6890 radeonsi/gfx9: implement primitive binning
This increases performance, but it was tuned for Raven, not Vega.
We don't know yet how Vega will perform, hopefully not worse.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05 12:09:02 +02:00
Marek Olšák 0797eea758 radeonsi/gfx9: don't use BREAK_BATCH and FLUSH_DFSM if DFSM is disabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05 12:09:02 +02:00
Marek Olšák 7dec48b81e radeonsi/gfx9: don't flush L2 metadata for DB if not needed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-22 13:29:47 +02:00
Marek Olšák aa64e24cb1 radeonsi/gfx9: don't flush L2 metadata for CB if not needed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-22 13:29:47 +02:00
Marek Olšák 5b62eb237c radeonsi/gfx9: don't flush TC L2 between rendering and texturing if not needed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-22 13:29:47 +02:00
Marek Olšák 113278ee79 radeonsi: remove Constant Engine support
We have come to the conclusion that it doesn't improve performance.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-22 13:29:47 +02:00
Samuel Pitoiset 39a35eb0c1 radeonsi: try to re-use previously deleted bindless descriptor slots
Currently, when the array is full it is resized but it can grow
over and over because we don't try to re-use descriptor slots.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 11:34:37 +02:00
Samuel Pitoiset c2dfa9b111 radeonsi: use slot indexes for bindless handles
Using VRAM address as bindless handles is not a good idea because
we have to use LLVMIntToPTr and the LLVM CSE pass can't optimize
because it has no information about the pointer.

Instead, use slots indexes like the existing descriptors. Note
that we use fixed 16-dword slots for both samplers and images.
This doesn't really matter because no real apps use image handles.

This improves performance with DOW3 by +7%.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 11:34:29 +02:00
Nicolai Hähnle 420c438589 radeonsi: log draw and compute state into log context
Also add missing trace emits and CS logging for compute launches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:53:34 +02:00
Nicolai Hähnle 4c3f36ec6b radeonsi: print saved CS to the log context
Use the auto logger facility, so that CS chunks will be interleaved
with other log info.

v2:
- fix some crashes when not using CE
- fix skipping "previous" chunks of current (unflushed) IB
- fix error handling in si_begin_cs_debug

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:53:14 +02:00
Marek Olšák 13aa8d3da9 radeonsi: don't use CLEAR_STATE on SI
This fixes random hangs with Unigine Valley.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102201

Fixes: 064550238e ("radeonsi: use CLEAR_STATE to initialize some registers")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-18 15:59:22 +02:00
Marek Olšák c093821cee radeonsi: rename shader_userdata -> shader_pointers where appropriate
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák e887c68bd2 radeonsi: add a separate dirty mask for prefetches
so that we don't rely on si_pm4_state_enabled_and_changed, allowing us
to move prefetches after draw calls.

v2: ckear the dirty mask after unbinding shaders

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2017-08-07 21:12:24 +02:00
Marek Olšák 58d062b87d radeonsi: de-atomize L2 prefetch
I'd like to be able to move the prefetch call site around.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák 1aeafb59e6 radeonsi: print CE IBs into ddebug reports
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-01 17:06:38 +02:00
Marek Olšák f4d095cc65 radeonsi: update dirty_level_mask only when flushing or unbinding framebuffer
This fixes corruption with bindless textures in Dawn Of War 3.

The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask
was only updated after the first draw call. The problem is bindless textures
are checked for decompression every draw call and we would only decompress
after the first draw call. The solution is to set dirtiness after the last
draw call to the framebuffer, so the (unconditional) decompression of
bindless textures happens at the right time.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-28 16:34:24 +02:00
Marek Olšák 28c7fbbe0f radeonsi: rely on CLEAR_STATE for clearing UCP and blend color registers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-28 08:03:24 +02:00
Marek Olšák ed2b3f5c81 radeonsi: decrease the number of compiler threads
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26 19:53:26 +02:00
Marek Olšák facfab28fe radeonsi/gfx9: add workarounds to avoid VGPR indexing completely
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.

In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák 79bd1d4f8b radeonsi/gfx9: keep reusing the same buffer/address for the gfx9 flush fence
instead of using a monotonic suballocator

v2: initialize the memory at context creation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák 2263610827 radeonsi: flush DB caches only when transitioning from DB to texturing
Use the mechanism of si_decompress_textures, but instead of doing
the actual decompression, just flag the DB cache flush there.

This removes a lot of unnecessary DB cache flushes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Samuel Pitoiset f00e80e3f7 radeonsi: keep track of the sampler state for texture handles
Needed for updating all resident texture descriptors when
dirty_tex_counter changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 10:14:52 +02:00
Samuel Pitoiset 6ff6863c32 radeonsi: reduce overhead for resident textures which need color decompression
This is done by introducing a separate list.

si_decompress_textures() is now 5x faster.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:38 +02:00
Samuel Pitoiset 06ed251c32 radeonsi: reduce overhead for resident textures which need depth decompression
This is done by introducing a separate list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:36 +02:00
Samuel Pitoiset 811756dfd0 radeonsi: upload new descriptors when resident buffers are invalidated
When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.

Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 2c3a7d5840 radeonsi: track use of bindless samplers/images from tgsi_shader_info
This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset e1813a8635 radeonsi: decompress resident textures/images before graphics/compute
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 9cc328eef6 radeonsi: implement ARB_bindless_texture
This implements the Gallium interface. Decompression of resident
textures/images will follow in the next patches.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 77bbdcdfcd radeonsi: add a slab allocator for bindless descriptors
For each texture/image handles, we need to allocate a new
buffer for the bindless descriptor. But when the number of
buffers added to the current CS becomes high, the overhead
in the winsys (and in the kernel) is important.

To reduce this bottleneck, the idea is to suballocate the
bindless descriptors using a slab similar to the one used
in the winsys.

Currently, a buffer can hold 1024 bindless descriptors but
this limit is arbitrary and could be changed in the future
for some reasons. Once a slab is allocated the "base" buffer
is added to a per-context list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Marek Olšák 4951b0adbd radeonsi: pack si_context better
there isn't much to gain here

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 6d43d352cc radeonsi: pack si_framebuffer better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák ca815f1ead radeonsi: pack si_sampler_view better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák c8b6f42e25 radeonsi: rename si_vertex_element -> si_vertex_elements
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák c503381864 radeonsi: get rid of more compressed_colortex_mask names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák d8a577d96e radeonsi: rename shader resource decompress masks to their true meaning
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák 391673af7a radeonsi: disable the patch ID workaround on SI when the patch ID isn't used (v2)
The workaround causes a massive performance decrease on 1-SE parts.
(Cape Verde, Hainan, Oland)

The performance regression is already part of 17.0 and 17.1.

v2: check tess_uses_prim_id

Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák 2b7fd9df9a radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:18 +02:00
Marek Olšák 140b3c5019 radeonsi: add a new helper si_get_vs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:16 +02:00
Samuel Pitoiset 878bd981bf radeonsi: isolate real framebuffer changes from the decompression passes (v3)
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.

v2: Marek - set the flags outside the loop
          - also clear and set framebuffer.do_update_surf_dirtiness there
          - do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:14 +02:00
Marek Olšák 7d67cbefe0 radeonsi: clean up decompress blend state names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:45 +02:00
Marek Olšák 86cc809726 radeonsi: use a compiler queue with a low priority for optimized shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák edb59ef2dc radeonsi: do only 1 big CE dump at end of IBs and one reload in the preamble
A later commit will only upload descriptors used by shaders, so we won't do
full dumps anymore, so the only way to have a complete mirror of CE RAM
in memory is to do a separate dump after the last draw call.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák 5df24c3fa6 radeonsi: merge constant and shader buffers descriptor lists into one
Constant buffers: slot[16], .. slot[31] (ascending)
Shader buffers: slot[15], .. slot[0] (descending)

The idea is that if we have 4 constant buffers and 2 shader buffers, we only
have to upload 6 slots. That optimization is left for a later commit.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Nicolai Hähnle 4ea67c1751 radeonsi: rename tcs_tes_uses_prim_id for clarity
What we care about is whether PrimID is used while tessellation is
enabled; whether it's used in TCS/TES or further down the pipeline is
irrelevant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-16 16:11:54 +02:00
Marek Olšák 330d0607ed gallium: remove pipe_index_buffer and set_index_buffer
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.

Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.

pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.

v2: fixes for nine, svga
2017-05-10 19:00:16 +02:00
Marek Olšák 9fd9a7d0ba radeonsi: remove VS epilog code, compile VS with PrimID export on demand
The use of PrimID in the pixel shader is too rare to deserve such
a sizable support code.

The initial idea of the VS epilog was to move the clipping code there and
remove it based on states, but optimized variants are now used to do that
and are easier to support, so the VS epilog has turned out to be not so
useful.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Marek Olšák 0588146cb0 radeonsi/gfx9: set up shader registers for merged LS-HS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28 21:47:35 +02:00
Nicolai Hähnle ff39f0d59c radeonsi: emit VS_STATE register explicitly from si_draw_vbo
We will merge other derived state information into this register.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 17:30:18 +02:00
Constantine Kharlamov fa8bc90990 r600g/radeonsi: use the correct types (taken from pipe_draw_info)
Note: si_shader.h has also "type" variable that should be changed to
"enum pipe_prim_type", however it triggers a bunch of warnings about
unhandled switches, so due not knowing the correct way to handle them, I
decided to leave it as is.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-04-04 22:15:47 +02:00
Marek Olšák 829bd77235 radeonsi: adjust checking for SC bug workarounds
no change in behavior, just making sure that no later chips will use
the workarounds

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:56 +02:00
Marek Olšák 7d2fa8dc10 radeonsi: decompress DCC in set_sampler_view instead of create_sampler_view (v2)
v2: don't add a new decompress helper function

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 20:57:53 +02:00
Marek Olšák 86f13c7363 radeonsi/gfx9: emit FLUSH_DFSM where required
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák ba2e7c68ce gallium/radeon: move pre-GFX9 radeon_surf.* members to radeon_surf.u.legacy.*
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Timothy Arceri 2efddc63ee gallium/util: replace pipe_mutex with mtx_t
pipe_mutex was made unnecessary with fd33a6bcd7.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:48:11 +11:00
Timothy Arceri 69a687189e radeon/ac: switch from radeon_shader_binary to ac_shader_binary
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Nicolai Hähnle 066a117be7 radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIK
The same PS epilog workaround as for 8-bit integer formats is required,
since the CB doesn't do clamping.

Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-21 10:45:13 +01:00
Marek Olšák 6b73aafceb radeonsi: use a clever alignment for constant buffer uploads
This results in a very tiny decrease in lgkm wait cycles.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák 1a392a4377 radeonsi: move CP_DMA_ALIGNMENT definition
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 4c288c73ea radeonsi: remove SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER
not necessary

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 65df38b191 radeonsi: remove separate CB/DB_META flush flags
not used separately

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 408f9a1584 radeonsi: atomize the scratch buffer state
The update frequency is very low.

Difference: Only account for the size when allocating a new one and when
            starting a new IB, and check for NULL. (v3)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Marek Olšák 5f99c49008 radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák c78177fc64 radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák ccecf79c2b radeonsi: state atom IDs don't have to be off by one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák ac059f1c23 radeonsi: use a bitmask for looping over dirty PM4 states
also move it to draw_vbo, because it should be 0 in most cases

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák 802fcdc0d2 radeonsi: atomize L2 prefetches
to move the big conditional statement out of draw_vbo

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák 879c73fac8 radeonsi: update dirty_level_mask only after the first draw after FB change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák cf248929bf radeonsi: use a global dirty mask for shader pointers
Only vertex buffers use a separate bool flag.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák 861d7af1cb radeonsi: use a bitmask-based loop in si_decompress_textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák d93b0eacb7 radeonsi: si_cp_dma_prepare is a no-op for L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák 395c49849d radeonsi: add SI_CPDMA_SKIP_BO_LIST_UPDATE
the next commit will use it in a clever way, because the CP DMA prefetch
doesn't need this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Bas Nieuwenhuizen 0ef1b4d5b1 ac/debug: Move IB decode to common code.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:43:59 +01:00
Marek Olšák ece6e1f658 radeonsi: add TC L2 prefetch for shaders and VBO descriptors
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák a131dacb14 radeonsi: add CP DMA flags for greater control over synchronization
for L2 prefetch

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Nicolai Hähnle 2f2e941e2d radeonsi: use a single descriptor for the GSVS ring
We can hardcode all of the fields for swizzling in the geometry shader.

The advantage is that we use fewer descriptor slots and we no longer have to
update any of the (ring) descriptors when the geometry shader changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:05 +01:00
Marek Olšák 6caa558ca6 radeonsi: check for sampler state CSO corruption
It really happens.

v2: declare "magic" in debug builds only

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-12-07 19:40:03 +01:00
Marek Olšák bf75ef3f92 radeonsi: remove all varyings for depth-only rendering or rasterization off
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 6d5c2a8b5c radeonsi: split the shader key into 3 logical parts
key->part.*: prolog and epilog flags only
key->as_{ls,es}: special flags
key->mono.*: flags for monolithic compilation only

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák fa476e0566 radeonsi: fast exit si_emit_derived_tess_state early
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Nicolai Hähnle 908f92ad1f radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and
others.

This leaves the case of triangle strips with adjacency and primitive restarts
open. It seems that the only thing that cares about that is a piglit test.
Fixing this efficiently would be really involved, and I don't want to use the
hammer of degrading to software handling of indices because there may well
be software that uses this draw mode (without caring about the precise
rotation of triangles).

v2:
- skip the GS prolog entirely if workaround is not needed
- only check for TES (TES is always non-null when tessellation is used)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:11:24 +01:00
Nicolai Hähnle 3b2516721b radeonsi: make the GS copy shader owned by the GS selector
The copy shader only depends on the selector. This change avoids creating
separate code paths for monolithic vs. non-monolithic geometry shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:50 +01:00
Marek Olšák ad45dce4a2 radeonsi: remove si_resource_create_custom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák 29144d0f34 gallium/radeon: stop using PIPE_BIND_CUSTOM
it has no effect whatsoever

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák a2ea653a49 radeonsi: remove cb0_is_integer handling
st/mesa does this for us.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák 8cdce30cc2 radeonsi: implement TC L2 write-back (flush) without cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák 71a5cf6f3b radeonsi: don't declare LDS in PS when ds_bpermute is used
I guess this is not needed because dead code elimination removes
the declaration.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:16 +02:00
Marek Olšák 3388f27d84 radeonsi: clean up lucky #include dependencies
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:06 +02:00
Marek Olšák 7ce19d9014 radeonsi: don't set sampler buffer offsets in create_sampler_view
do it at bind time, so that pipe_sampler_view is immutable with regard to
buffer reallocations and we don't have to remember all existing buffer
views.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:01 +02:00
Marek Olšák 3ee9be42ac radeonsi: move VGT_LS_HS_CONFIG to derived tess_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:53 +02:00
Marek Olšák a67d81580b radeonsi: remove the cache_flush atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák fe40a65fb6 radeonsi: skip redundant INDEX_TYPE writes
Ported from Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák 911202817d radeonsi: don't emit CS_PARTIAL_FLUSH if compute is not used
for less noise in the HUD

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Nicolai Hähnle b6c71d37c7 radeonsi: program the DRAWID SGPR
Note that for indirect draws, the new MULTI firmware packets are required.

There's also no need to reset last_{start_instance,sh_base_reg}, since
resetting last_base_vertex is sufficient.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle 96bbb620a5 radeonsi: add has_draw_indirect_multi flag
Prefer to use DRAW_(INDEX)_INDIRECT_MULTI when available in the firmware.

Versions for SI and CI already added as provided by the firmware team, but
keep in mind that they won't currently be used since the radeon kernel module
has no interface to query the firmware version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:06 +02:00
Marek Olšák a6bfafa083 gallium/radeon: move last_gfx_fence from radeonsi to common code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák c15a9dec29 radeonsi: skip unnecessary si_update_shaders calls
Small decrease in draw call overhead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Nicolai Hähnle d938b8c0bf radeonsi: explicitly choose center locations for 1xAA on Polaris
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.

As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:52:50 +02:00
Marek Olšák 5c92c21369 radeonsi: do compilation from si_create_shader_selector asynchronously
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.

This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.

If an app creates more shaders, at most 4 threads will be used to compile
them.

Debug output disables this for shader stats to be printed in the correct
order.

(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 027ad71b57 radeonsi: print LLVM IRs to ddebug logs
Getting LLVM IRs of hanging shaders have never been easier.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 28a03be06b radeonsi: enable string markers and record apitrace call numbers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák eff81cbc81 radeonsi: enable distributed tess on multi-SE parts only
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák dd56d04568 radeonsi: set optimal VGT_HS_OFFCHIP_PARAM
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák 3eacbc52d5 radeonsi: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák 28d0d0c5b4 radeonsi: fix fractional odd tessellation spacing for Polaris
ported from Vulkan (and no source explains why this is needed)

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 17:36:43 +02:00
Marek Olšák 8f3ef4e8b8 radeonsi: optimize rendering to linear color buffers
loosely ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Nicolai Hähnle d46a9db840 radeon: check VM faults from DMA flush
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:03 +02:00
Nicolai Hähnle ad8438403b radeonsi: extract IB and bo list saving into separate functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:02 +02:00
Marek Olšák 4140afd04b gallium/radeon: add driver queries for compute/dma call stats and spills
also print the average count per frame

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Nicolai Hähnle 8239da28e8 radeonsi: keep track of dirty descriptor sets
Reduces CPU load for draw calls that change none or few of the descriptors.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:10 +02:00
Nicolai Hähnle d152c73712 radeonsi: move si_descriptors into a per-context array
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:07 +02:00
Nicolai Hähnle 031b57bc2f radeonsi: move enabled_mask out of si_descriptors
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:23 +02:00
Marek Olšák 95c5bbae66 radeonsi: set some image descriptor fields at bind time
mainly the fields that can change by reallocating a texture and changing
the tile mode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Bas Nieuwenhuizen 35818129a6 radeonsi: Decompress DCC textures in a render feedback loop.
By using a counter to quickly reject textures that are not
bound to a framebuffer, the performance impact when binding
sampler_views/images is not too large.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 21:43:04 +02:00
Bas Nieuwenhuizen d27ff7d683 radeonsi: Add buffer for offchip storage between TCS and TES.
The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.

v2: - Use the correct register for SI.
    - Add define for block size.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Marek Olšák 498a40cae8 radeonsi: only expose *_init_*dma_functions from (S)DMA files
just normalizing the interfaces

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák f564b61d33 radeonsi: rework clear_buffer flags
Changes:
- don't flush DB for fast color clears
- don't flush any caches for initial clears
- remove the flag from si_copy_buffer, always assume shader coherency

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-28 20:16:56 +02:00
Marek Olšák 3acaefb1bb radeonsi: shorten slot masks to 32 bits
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Marek Olšák 698821bda3 radeonsi: rework polygon stippling to use constant buffer instead of texture
add it to the RW_BUFFERS descriptor array

now the slot masks don't have to have 64 bits

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Marek Olšák 36261c29cd radeonsi: make RW buffer descriptor array global, not per shader stage
v2: also simplify invalidation of RW buffer bindings (squashed)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Bas Nieuwenhuizen 41d79bcbfa radeonsi: clean up compute flush
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:32 +02:00
Bas Nieuwenhuizen 061ce9399a radeonsi: split texture decompression for compute shaders
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen e56514f631 radeonsi: update predicate condition for compute dispatches
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 1349dd16ff radeonsi: only emit compute shader state when switching shaders
v2: - Do check if anything changed earlier
    - Use emitted_program instead of emitted_bo to prevent
      shaders with shader->bo = NULL confusing the check
    - Use radeon_set_sh_reg*

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen ba1f66a73d radeonsi: rework compute scratch buffer
Instead of having a scratch buffer per program, have one per
context.

Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.

v2: Fix style issue.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 107f4d3538 radeonsi: do per cs setup for compute shaders once per cs
Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.

v2: - Use radeon_set_sh_reg_seq.
    - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen aabc7d61d6 radeonsi: Add CE uploader.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen 86c71ff989 radeonsi: Add CE synchronization.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen 8fee75d606 radeonsi: Create CE IB.
Based on work by Marek Olšák.

v2: Add preamble IB.

Leaves the load packet in the space calculation as the
radeon winsys might not be able to support a premable.

The added space calculation may look expensive, but
is converted to a constant with (at least) -O2 and -O3.

v3: - Fix code style.
    - Remove needed space for vertex buffer descriptors.
    - Fail when the preamble cannot be created.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Nicolai Hähnle c495c0ad37 radeonsi: implement set_shader_buffers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:26 -05:00
Marek Olšák 2ca5566ed7 radeonsi: move scissor and viewport states into gallium/radeon
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:13:24 +02:00
Marek Olšák cb21f8a97c radeonsi: compute scissor from viewport in set_viewport_states
and clamp it right before emitting. This is a prerequisite for computing
the guard band.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:49 +02:00
Marek Olšák b82893f93a gallium/radeon: move pipeline stat context flags to common code
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák f3eebb84eb radeonsi: implement and rely on set_active_query_state
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:46 +02:00
Nicolai Hähnle 9d2693f58a radeonsi: expand the compressed color and depth texture masks to 64 bits
This is in preparation of raising the number of exposed sampler views to 32
bits, which will raise the total number of sampler views to 33 for the
polygon stipple texture. That texture should never be compressed (and it's
certainly not a depth texture), but this approach seems cleaner to me than
special-casing the last slot in all affected code paths.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Nicolai Hähnle e85cf35a65 radeonsi: implement set_shader_images (v2)
Whether DCC is disabled depends on the access flags with which the image
is bound: image_load supports DCC, but store and atomic don't.

v2: remove an unnecessary masking of images->desc.enabled_mask

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle da68a9b215 radeonsi: move si_decompress_textures to si_blit.c
Since it is all about calling into blitter functions, it makes more
sense here. This change also reduces the size of the interfaces between
.c files.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:49 -05:00
Nicolai Hähnle 2bf8ee34b8 radeonsi: remove resource field from si_sampler_view
view->resource is redundant with view->base.texture, so get rid of it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-09 15:02:27 +01:00
Bas Nieuwenhuizen 1e48ec7571 radeonsi: add DCC decompression (v2)
This is currently not needed but will be necessary when we have
features that do not work with DCC enabled, such as image stores
and sharing non-scanout surfaces.

v2: Marek: rebase, remove decompression from si_flush_resource (not needed)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák b744ac9f44 radeonsi: allocate DCC in the same backing buffer as the texture
To allow sharing textures with DCC enabled.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák ff360a52e6 radeonsi: implement binary shaders & shader cache in memory (v2)
v2: handle _mesa_hash_table_insert failure
    other cosmetic changes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák 4636d9be4a radeonsi: add PS prolog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák e79bb746ab radeonsi: add PS epilog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák eb10919b83 radeonsi: add TCS epilog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák e1b21696a3 radeonsi: add VS epilog
It only exports the primitive ID.
Also used by TES when it's compiled as VS.

The VS input location of the primitive ID input is v2.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák 70de433dea radeonsi: add VS prolog
This is disabled with use_monolithic_shaders = true.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák 19a92886a8 radeonsi: first bits for non-monolithic shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák 7aedbbacae radeonsi: put image, fmask, and sampler descriptors into one array
The texture slot is expanded to 16 dwords containing 2 descriptors.
Those can be:
- Image and fmask, or
- Image and sampler state

By carefully choosing the locations, we can put all three into one slot,
with the fmask and sampler state being mutually exclusive.

This improves shaders in 2 ways:
- 2 user SGPRs are unused, shaders can use them as temporary registers now
- each pair of descriptors is always on the same cache line

v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask
    at the same time

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-10 19:41:49 +01:00
Marek Olšák b9126dcda8 radeonsi: implement forcing per-sample_interpolation using the shader key only
It was partly a state and partly emulated by shader code, but since we want
to do this in a fragment shader prolog, we need to put it into the shader
key, which will be used to generate the prolog.

This also removes the spi_ps_input states and moves the registers
to the PS state.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák 066d76c2f4 radeonsi: rename cb_target_mask state to cb_render_state
and rename a variable in the function.

SX_PS_DOWNCONVERT will be emitted here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-02 21:03:19 +01:00
Marek Olšák f6360de8c0 radeonsi: use all SPI color formats
because not using SPI_SHADER_32_ABGR doubles fill rate.

We should also get optimal performance if alpha isn't needed or blending
isn't enabled.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák 8667a1aea2 radeonsi: use SPI_SHADER_COL_FORMAT fields instead of export_16bpc
This does change the behavior slightly:
  If a shader writes COLOR[i] and that color buffer isn't bound,
  the shader will export MRT_NULL instead and discard the IR tree that
  calculates the output. The only exception is alpha-to-coverage, which
  requires an alpha export.

v2: - update a comment about 16BPC
    - account for MRTZ when when fixing alpha-test/kill

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Nicolai Hähnle 7b8db37abb radeonsi: add RADEON_REPLACE_SHADERS debug option
This option allows replacing a single shader by a pre-compiled ELF object
as generated by LLVM's llc, for example. This can be useful for debugging a
deterministically occuring error in shaders (and has in fact helped find
the causes of https://bugs.freedesktop.org/show_bug.cgi?id=93264).

v2: drop the debug flag, use DEBUG_GET_ONCE_OPTION instead

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-29 09:07:04 -05:00
Marek Olšák 1a24f443b4 radeonsi: implement fast stencil clear
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Nicolai Hähnle ad22006892 radeonsi: implement AMD_performance_monitor for CIK+
Expose most of the performance counter groups that are exposed by Catalyst.
Ideally, the driver will work with GPUPerfStudio at some point, but we are not
quite there yet. In any case, this is the reason for grouping multiple
instances of hardware blocks in the way it is implemented.

The counters can also be shown using the Gallium HUD. If one is interested to
see how work is distributed across multiple shader engines, one can set the
environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained performance
counter groups.

Part of the implementation is in radeon because an implementation for
older hardware would largely follow along the same lines, but exposing
a different set of blocks which are programmed slightly differently.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:52:09 +01:00
Marek Olšák b1c5f3faa9 radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.

There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.

This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák 3d963abc81 radeonsi: prevent recursion in si_context_gfx_flush
The recursion can only occur if you modify need_cs_space to always flush.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák c6012a6650 radeonsi: rename cache flushing flags once more
KCACHE, TC L1 and TC L2 are renamed to:
- SMEM L1
- VMEM L1
- GLOBAL L2

You can easily tell what they are used for now.
Shaders must deal with coherency issues between both L1s manually,
e.g. by setting GLC=1 or by using s_dcache_*.

BOTH_ICACHE_KCACHE was an unused definition.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Bas Nieuwenhuizen 48b5f104ac radeonsi: Enable DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:30 +02:00
Marek Olšák 06083046a4 radeonsi: add another requirement for PARTIAL_ES_WAVE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák 07b3cc6ecf radeonsi: allow unbinding pixel shaders and remove the dummy shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák 9b54ce3362 radeonsi: support thread-safe shaders shared by multiple contexts
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.

The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.

This is also a prerequisite for multithreaded shader compilation.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:51:51 +02:00
Marek Olšák 13e69805ea radeonsi: fix a GS hang on VI
Broken by one of the cleanups: 0d46c3bc9d
Not applicable to stable.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 19:18:50 +02:00
Marek Olšák 9652bfcf2d radeonsi: implement the simple case of force_persample_interp
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák 214de2d815 radeonsi: move SPI_PS_INPUT_ENA/ADDR registers to a separate state
This will be a derived state used for changing center->sample and
centroid->sample at runtime.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák c23c92c965 radeonsi: only do depth-only or stencil-only in-place decompression
instead of always doing both.
Usually, only depth is needed, so stencil decompression is useless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák cc92b90375 radeonsi: dump buffer lists while debugging
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák 9bd7928a35 radeonsi: add an option for debugging VM faults
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák a9971e85d9 radeonsi: rework uploading border colors
The border colors are uploaded only once when the state is created.

This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.

It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák 228e80123a radeonsi: reorder si_context variables
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00