Commit Graph

215 Commits

Author SHA1 Message Date
Marek Olšák 5f99c49008 radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák c78177fc64 radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák ccecf79c2b radeonsi: state atom IDs don't have to be off by one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák ac059f1c23 radeonsi: use a bitmask for looping over dirty PM4 states
also move it to draw_vbo, because it should be 0 in most cases

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák 802fcdc0d2 radeonsi: atomize L2 prefetches
to move the big conditional statement out of draw_vbo

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák 879c73fac8 radeonsi: update dirty_level_mask only after the first draw after FB change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák cf248929bf radeonsi: use a global dirty mask for shader pointers
Only vertex buffers use a separate bool flag.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák 861d7af1cb radeonsi: use a bitmask-based loop in si_decompress_textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák d93b0eacb7 radeonsi: si_cp_dma_prepare is a no-op for L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák 395c49849d radeonsi: add SI_CPDMA_SKIP_BO_LIST_UPDATE
the next commit will use it in a clever way, because the CP DMA prefetch
doesn't need this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Bas Nieuwenhuizen 0ef1b4d5b1 ac/debug: Move IB decode to common code.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:43:59 +01:00
Marek Olšák ece6e1f658 radeonsi: add TC L2 prefetch for shaders and VBO descriptors
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák a131dacb14 radeonsi: add CP DMA flags for greater control over synchronization
for L2 prefetch

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Nicolai Hähnle 2f2e941e2d radeonsi: use a single descriptor for the GSVS ring
We can hardcode all of the fields for swizzling in the geometry shader.

The advantage is that we use fewer descriptor slots and we no longer have to
update any of the (ring) descriptors when the geometry shader changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:05 +01:00
Marek Olšák 6caa558ca6 radeonsi: check for sampler state CSO corruption
It really happens.

v2: declare "magic" in debug builds only

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-12-07 19:40:03 +01:00
Marek Olšák bf75ef3f92 radeonsi: remove all varyings for depth-only rendering or rasterization off
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 6d5c2a8b5c radeonsi: split the shader key into 3 logical parts
key->part.*: prolog and epilog flags only
key->as_{ls,es}: special flags
key->mono.*: flags for monolithic compilation only

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák fa476e0566 radeonsi: fast exit si_emit_derived_tess_state early
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Nicolai Hähnle 908f92ad1f radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and
others.

This leaves the case of triangle strips with adjacency and primitive restarts
open. It seems that the only thing that cares about that is a piglit test.
Fixing this efficiently would be really involved, and I don't want to use the
hammer of degrading to software handling of indices because there may well
be software that uses this draw mode (without caring about the precise
rotation of triangles).

v2:
- skip the GS prolog entirely if workaround is not needed
- only check for TES (TES is always non-null when tessellation is used)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:11:24 +01:00
Nicolai Hähnle 3b2516721b radeonsi: make the GS copy shader owned by the GS selector
The copy shader only depends on the selector. This change avoids creating
separate code paths for monolithic vs. non-monolithic geometry shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:50 +01:00
Marek Olšák ad45dce4a2 radeonsi: remove si_resource_create_custom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák 29144d0f34 gallium/radeon: stop using PIPE_BIND_CUSTOM
it has no effect whatsoever

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák a2ea653a49 radeonsi: remove cb0_is_integer handling
st/mesa does this for us.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák 8cdce30cc2 radeonsi: implement TC L2 write-back (flush) without cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák 71a5cf6f3b radeonsi: don't declare LDS in PS when ds_bpermute is used
I guess this is not needed because dead code elimination removes
the declaration.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:16 +02:00
Marek Olšák 3388f27d84 radeonsi: clean up lucky #include dependencies
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:06 +02:00
Marek Olšák 7ce19d9014 radeonsi: don't set sampler buffer offsets in create_sampler_view
do it at bind time, so that pipe_sampler_view is immutable with regard to
buffer reallocations and we don't have to remember all existing buffer
views.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:01 +02:00
Marek Olšák 3ee9be42ac radeonsi: move VGT_LS_HS_CONFIG to derived tess_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:53 +02:00
Marek Olšák a67d81580b radeonsi: remove the cache_flush atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák fe40a65fb6 radeonsi: skip redundant INDEX_TYPE writes
Ported from Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák 911202817d radeonsi: don't emit CS_PARTIAL_FLUSH if compute is not used
for less noise in the HUD

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Nicolai Hähnle b6c71d37c7 radeonsi: program the DRAWID SGPR
Note that for indirect draws, the new MULTI firmware packets are required.

There's also no need to reset last_{start_instance,sh_base_reg}, since
resetting last_base_vertex is sufficient.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle 96bbb620a5 radeonsi: add has_draw_indirect_multi flag
Prefer to use DRAW_(INDEX)_INDIRECT_MULTI when available in the firmware.

Versions for SI and CI already added as provided by the firmware team, but
keep in mind that they won't currently be used since the radeon kernel module
has no interface to query the firmware version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:06 +02:00
Marek Olšák a6bfafa083 gallium/radeon: move last_gfx_fence from radeonsi to common code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák c15a9dec29 radeonsi: skip unnecessary si_update_shaders calls
Small decrease in draw call overhead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Nicolai Hähnle d938b8c0bf radeonsi: explicitly choose center locations for 1xAA on Polaris
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.

As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:52:50 +02:00
Marek Olšák 5c92c21369 radeonsi: do compilation from si_create_shader_selector asynchronously
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.

This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.

If an app creates more shaders, at most 4 threads will be used to compile
them.

Debug output disables this for shader stats to be printed in the correct
order.

(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 027ad71b57 radeonsi: print LLVM IRs to ddebug logs
Getting LLVM IRs of hanging shaders have never been easier.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 28a03be06b radeonsi: enable string markers and record apitrace call numbers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák eff81cbc81 radeonsi: enable distributed tess on multi-SE parts only
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák dd56d04568 radeonsi: set optimal VGT_HS_OFFCHIP_PARAM
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák 3eacbc52d5 radeonsi: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák 28d0d0c5b4 radeonsi: fix fractional odd tessellation spacing for Polaris
ported from Vulkan (and no source explains why this is needed)

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 17:36:43 +02:00
Marek Olšák 8f3ef4e8b8 radeonsi: optimize rendering to linear color buffers
loosely ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Nicolai Hähnle d46a9db840 radeon: check VM faults from DMA flush
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:03 +02:00
Nicolai Hähnle ad8438403b radeonsi: extract IB and bo list saving into separate functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:02 +02:00
Marek Olšák 4140afd04b gallium/radeon: add driver queries for compute/dma call stats and spills
also print the average count per frame

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Nicolai Hähnle 8239da28e8 radeonsi: keep track of dirty descriptor sets
Reduces CPU load for draw calls that change none or few of the descriptors.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:10 +02:00
Nicolai Hähnle d152c73712 radeonsi: move si_descriptors into a per-context array
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:07 +02:00
Nicolai Hähnle 031b57bc2f radeonsi: move enabled_mask out of si_descriptors
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:23 +02:00