Commit Graph

408 Commits

Author SHA1 Message Date
Marek Olšák 861d7af1cb radeonsi: use a bitmask-based loop in si_decompress_textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák d93b0eacb7 radeonsi: si_cp_dma_prepare is a no-op for L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák 395c49849d radeonsi: add SI_CPDMA_SKIP_BO_LIST_UPDATE
the next commit will use it in a clever way, because the CP DMA prefetch
doesn't need this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Bas Nieuwenhuizen 0ef1b4d5b1 ac/debug: Move IB decode to common code.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:43:59 +01:00
Marek Olšák ece6e1f658 radeonsi: add TC L2 prefetch for shaders and VBO descriptors
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák a131dacb14 radeonsi: add CP DMA flags for greater control over synchronization
for L2 prefetch

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Nicolai Hähnle 2f2e941e2d radeonsi: use a single descriptor for the GSVS ring
We can hardcode all of the fields for swizzling in the geometry shader.

The advantage is that we use fewer descriptor slots and we no longer have to
update any of the (ring) descriptors when the geometry shader changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:05 +01:00
Marek Olšák 6caa558ca6 radeonsi: check for sampler state CSO corruption
It really happens.

v2: declare "magic" in debug builds only

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-12-07 19:40:03 +01:00
Marek Olšák bf75ef3f92 radeonsi: remove all varyings for depth-only rendering or rasterization off
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 6d5c2a8b5c radeonsi: split the shader key into 3 logical parts
key->part.*: prolog and epilog flags only
key->as_{ls,es}: special flags
key->mono.*: flags for monolithic compilation only

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák fa476e0566 radeonsi: fast exit si_emit_derived_tess_state early
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Nicolai Hähnle 908f92ad1f radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and
others.

This leaves the case of triangle strips with adjacency and primitive restarts
open. It seems that the only thing that cares about that is a piglit test.
Fixing this efficiently would be really involved, and I don't want to use the
hammer of degrading to software handling of indices because there may well
be software that uses this draw mode (without caring about the precise
rotation of triangles).

v2:
- skip the GS prolog entirely if workaround is not needed
- only check for TES (TES is always non-null when tessellation is used)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:11:24 +01:00
Nicolai Hähnle 3b2516721b radeonsi: make the GS copy shader owned by the GS selector
The copy shader only depends on the selector. This change avoids creating
separate code paths for monolithic vs. non-monolithic geometry shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:50 +01:00
Marek Olšák ad45dce4a2 radeonsi: remove si_resource_create_custom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák 29144d0f34 gallium/radeon: stop using PIPE_BIND_CUSTOM
it has no effect whatsoever

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák a2ea653a49 radeonsi: remove cb0_is_integer handling
st/mesa does this for us.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák 8cdce30cc2 radeonsi: implement TC L2 write-back (flush) without cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák 71a5cf6f3b radeonsi: don't declare LDS in PS when ds_bpermute is used
I guess this is not needed because dead code elimination removes
the declaration.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:16 +02:00
Marek Olšák 3388f27d84 radeonsi: clean up lucky #include dependencies
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:06 +02:00
Marek Olšák 7ce19d9014 radeonsi: don't set sampler buffer offsets in create_sampler_view
do it at bind time, so that pipe_sampler_view is immutable with regard to
buffer reallocations and we don't have to remember all existing buffer
views.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:01 +02:00
Marek Olšák 3ee9be42ac radeonsi: move VGT_LS_HS_CONFIG to derived tess_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:53 +02:00
Marek Olšák a67d81580b radeonsi: remove the cache_flush atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák fe40a65fb6 radeonsi: skip redundant INDEX_TYPE writes
Ported from Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák 911202817d radeonsi: don't emit CS_PARTIAL_FLUSH if compute is not used
for less noise in the HUD

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Nicolai Hähnle b6c71d37c7 radeonsi: program the DRAWID SGPR
Note that for indirect draws, the new MULTI firmware packets are required.

There's also no need to reset last_{start_instance,sh_base_reg}, since
resetting last_base_vertex is sufficient.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle 96bbb620a5 radeonsi: add has_draw_indirect_multi flag
Prefer to use DRAW_(INDEX)_INDIRECT_MULTI when available in the firmware.

Versions for SI and CI already added as provided by the firmware team, but
keep in mind that they won't currently be used since the radeon kernel module
has no interface to query the firmware version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:06 +02:00
Marek Olšák a6bfafa083 gallium/radeon: move last_gfx_fence from radeonsi to common code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák c15a9dec29 radeonsi: skip unnecessary si_update_shaders calls
Small decrease in draw call overhead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Nicolai Hähnle d938b8c0bf radeonsi: explicitly choose center locations for 1xAA on Polaris
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.

As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:52:50 +02:00
Marek Olšák 5c92c21369 radeonsi: do compilation from si_create_shader_selector asynchronously
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.

This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.

If an app creates more shaders, at most 4 threads will be used to compile
them.

Debug output disables this for shader stats to be printed in the correct
order.

(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 027ad71b57 radeonsi: print LLVM IRs to ddebug logs
Getting LLVM IRs of hanging shaders have never been easier.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 28a03be06b radeonsi: enable string markers and record apitrace call numbers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák eff81cbc81 radeonsi: enable distributed tess on multi-SE parts only
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák dd56d04568 radeonsi: set optimal VGT_HS_OFFCHIP_PARAM
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák 3eacbc52d5 radeonsi: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák 28d0d0c5b4 radeonsi: fix fractional odd tessellation spacing for Polaris
ported from Vulkan (and no source explains why this is needed)

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 17:36:43 +02:00
Marek Olšák 8f3ef4e8b8 radeonsi: optimize rendering to linear color buffers
loosely ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Nicolai Hähnle d46a9db840 radeon: check VM faults from DMA flush
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:03 +02:00
Nicolai Hähnle ad8438403b radeonsi: extract IB and bo list saving into separate functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:02 +02:00
Marek Olšák 4140afd04b gallium/radeon: add driver queries for compute/dma call stats and spills
also print the average count per frame

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Nicolai Hähnle 8239da28e8 radeonsi: keep track of dirty descriptor sets
Reduces CPU load for draw calls that change none or few of the descriptors.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:10 +02:00
Nicolai Hähnle d152c73712 radeonsi: move si_descriptors into a per-context array
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:07 +02:00
Nicolai Hähnle 031b57bc2f radeonsi: move enabled_mask out of si_descriptors
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:23 +02:00
Marek Olšák 95c5bbae66 radeonsi: set some image descriptor fields at bind time
mainly the fields that can change by reallocating a texture and changing
the tile mode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Bas Nieuwenhuizen 35818129a6 radeonsi: Decompress DCC textures in a render feedback loop.
By using a counter to quickly reject textures that are not
bound to a framebuffer, the performance impact when binding
sampler_views/images is not too large.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 21:43:04 +02:00
Bas Nieuwenhuizen d27ff7d683 radeonsi: Add buffer for offchip storage between TCS and TES.
The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.

v2: - Use the correct register for SI.
    - Add define for block size.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Marek Olšák 498a40cae8 radeonsi: only expose *_init_*dma_functions from (S)DMA files
just normalizing the interfaces

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák f564b61d33 radeonsi: rework clear_buffer flags
Changes:
- don't flush DB for fast color clears
- don't flush any caches for initial clears
- remove the flag from si_copy_buffer, always assume shader coherency

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-28 20:16:56 +02:00
Marek Olšák 3acaefb1bb radeonsi: shorten slot masks to 32 bits
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Marek Olšák 698821bda3 radeonsi: rework polygon stippling to use constant buffer instead of texture
add it to the RW_BUFFERS descriptor array

now the slot masks don't have to have 64 bits

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Marek Olšák 36261c29cd radeonsi: make RW buffer descriptor array global, not per shader stage
v2: also simplify invalidation of RW buffer bindings (squashed)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Bas Nieuwenhuizen 41d79bcbfa radeonsi: clean up compute flush
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:32 +02:00
Bas Nieuwenhuizen 061ce9399a radeonsi: split texture decompression for compute shaders
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen e56514f631 radeonsi: update predicate condition for compute dispatches
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 1349dd16ff radeonsi: only emit compute shader state when switching shaders
v2: - Do check if anything changed earlier
    - Use emitted_program instead of emitted_bo to prevent
      shaders with shader->bo = NULL confusing the check
    - Use radeon_set_sh_reg*

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen ba1f66a73d radeonsi: rework compute scratch buffer
Instead of having a scratch buffer per program, have one per
context.

Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.

v2: Fix style issue.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 107f4d3538 radeonsi: do per cs setup for compute shaders once per cs
Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.

v2: - Use radeon_set_sh_reg_seq.
    - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen aabc7d61d6 radeonsi: Add CE uploader.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen 86c71ff989 radeonsi: Add CE synchronization.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen 8fee75d606 radeonsi: Create CE IB.
Based on work by Marek Olšák.

v2: Add preamble IB.

Leaves the load packet in the space calculation as the
radeon winsys might not be able to support a premable.

The added space calculation may look expensive, but
is converted to a constant with (at least) -O2 and -O3.

v3: - Fix code style.
    - Remove needed space for vertex buffer descriptors.
    - Fail when the preamble cannot be created.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Nicolai Hähnle c495c0ad37 radeonsi: implement set_shader_buffers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:26 -05:00
Marek Olšák 2ca5566ed7 radeonsi: move scissor and viewport states into gallium/radeon
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:13:24 +02:00
Marek Olšák cb21f8a97c radeonsi: compute scissor from viewport in set_viewport_states
and clamp it right before emitting. This is a prerequisite for computing
the guard band.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:49 +02:00
Marek Olšák b82893f93a gallium/radeon: move pipeline stat context flags to common code
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák f3eebb84eb radeonsi: implement and rely on set_active_query_state
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:46 +02:00
Nicolai Hähnle 9d2693f58a radeonsi: expand the compressed color and depth texture masks to 64 bits
This is in preparation of raising the number of exposed sampler views to 32
bits, which will raise the total number of sampler views to 33 for the
polygon stipple texture. That texture should never be compressed (and it's
certainly not a depth texture), but this approach seems cleaner to me than
special-casing the last slot in all affected code paths.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Nicolai Hähnle e85cf35a65 radeonsi: implement set_shader_images (v2)
Whether DCC is disabled depends on the access flags with which the image
is bound: image_load supports DCC, but store and atomic don't.

v2: remove an unnecessary masking of images->desc.enabled_mask

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle da68a9b215 radeonsi: move si_decompress_textures to si_blit.c
Since it is all about calling into blitter functions, it makes more
sense here. This change also reduces the size of the interfaces between
.c files.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:49 -05:00
Nicolai Hähnle 2bf8ee34b8 radeonsi: remove resource field from si_sampler_view
view->resource is redundant with view->base.texture, so get rid of it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-09 15:02:27 +01:00
Bas Nieuwenhuizen 1e48ec7571 radeonsi: add DCC decompression (v2)
This is currently not needed but will be necessary when we have
features that do not work with DCC enabled, such as image stores
and sharing non-scanout surfaces.

v2: Marek: rebase, remove decompression from si_flush_resource (not needed)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák b744ac9f44 radeonsi: allocate DCC in the same backing buffer as the texture
To allow sharing textures with DCC enabled.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák ff360a52e6 radeonsi: implement binary shaders & shader cache in memory (v2)
v2: handle _mesa_hash_table_insert failure
    other cosmetic changes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák 4636d9be4a radeonsi: add PS prolog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák e79bb746ab radeonsi: add PS epilog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák eb10919b83 radeonsi: add TCS epilog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák e1b21696a3 radeonsi: add VS epilog
It only exports the primitive ID.
Also used by TES when it's compiled as VS.

The VS input location of the primitive ID input is v2.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák 70de433dea radeonsi: add VS prolog
This is disabled with use_monolithic_shaders = true.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák 19a92886a8 radeonsi: first bits for non-monolithic shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák 7aedbbacae radeonsi: put image, fmask, and sampler descriptors into one array
The texture slot is expanded to 16 dwords containing 2 descriptors.
Those can be:
- Image and fmask, or
- Image and sampler state

By carefully choosing the locations, we can put all three into one slot,
with the fmask and sampler state being mutually exclusive.

This improves shaders in 2 ways:
- 2 user SGPRs are unused, shaders can use them as temporary registers now
- each pair of descriptors is always on the same cache line

v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask
    at the same time

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-10 19:41:49 +01:00
Marek Olšák b9126dcda8 radeonsi: implement forcing per-sample_interpolation using the shader key only
It was partly a state and partly emulated by shader code, but since we want
to do this in a fragment shader prolog, we need to put it into the shader
key, which will be used to generate the prolog.

This also removes the spi_ps_input states and moves the registers
to the PS state.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák 066d76c2f4 radeonsi: rename cb_target_mask state to cb_render_state
and rename a variable in the function.

SX_PS_DOWNCONVERT will be emitted here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-02 21:03:19 +01:00
Marek Olšák f6360de8c0 radeonsi: use all SPI color formats
because not using SPI_SHADER_32_ABGR doubles fill rate.

We should also get optimal performance if alpha isn't needed or blending
isn't enabled.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák 8667a1aea2 radeonsi: use SPI_SHADER_COL_FORMAT fields instead of export_16bpc
This does change the behavior slightly:
  If a shader writes COLOR[i] and that color buffer isn't bound,
  the shader will export MRT_NULL instead and discard the IR tree that
  calculates the output. The only exception is alpha-to-coverage, which
  requires an alpha export.

v2: - update a comment about 16BPC
    - account for MRTZ when when fixing alpha-test/kill

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Nicolai Hähnle 7b8db37abb radeonsi: add RADEON_REPLACE_SHADERS debug option
This option allows replacing a single shader by a pre-compiled ELF object
as generated by LLVM's llc, for example. This can be useful for debugging a
deterministically occuring error in shaders (and has in fact helped find
the causes of https://bugs.freedesktop.org/show_bug.cgi?id=93264).

v2: drop the debug flag, use DEBUG_GET_ONCE_OPTION instead

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-29 09:07:04 -05:00
Marek Olšák 1a24f443b4 radeonsi: implement fast stencil clear
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Nicolai Hähnle ad22006892 radeonsi: implement AMD_performance_monitor for CIK+
Expose most of the performance counter groups that are exposed by Catalyst.
Ideally, the driver will work with GPUPerfStudio at some point, but we are not
quite there yet. In any case, this is the reason for grouping multiple
instances of hardware blocks in the way it is implemented.

The counters can also be shown using the Gallium HUD. If one is interested to
see how work is distributed across multiple shader engines, one can set the
environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained performance
counter groups.

Part of the implementation is in radeon because an implementation for
older hardware would largely follow along the same lines, but exposing
a different set of blocks which are programmed slightly differently.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:52:09 +01:00
Marek Olšák b1c5f3faa9 radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.

There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.

This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák 3d963abc81 radeonsi: prevent recursion in si_context_gfx_flush
The recursion can only occur if you modify need_cs_space to always flush.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák c6012a6650 radeonsi: rename cache flushing flags once more
KCACHE, TC L1 and TC L2 are renamed to:
- SMEM L1
- VMEM L1
- GLOBAL L2

You can easily tell what they are used for now.
Shaders must deal with coherency issues between both L1s manually,
e.g. by setting GLC=1 or by using s_dcache_*.

BOTH_ICACHE_KCACHE was an unused definition.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Bas Nieuwenhuizen 48b5f104ac radeonsi: Enable DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:30 +02:00
Marek Olšák 06083046a4 radeonsi: add another requirement for PARTIAL_ES_WAVE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák 07b3cc6ecf radeonsi: allow unbinding pixel shaders and remove the dummy shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák 9b54ce3362 radeonsi: support thread-safe shaders shared by multiple contexts
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.

The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.

This is also a prerequisite for multithreaded shader compilation.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:51:51 +02:00
Marek Olšák 13e69805ea radeonsi: fix a GS hang on VI
Broken by one of the cleanups: 0d46c3bc9d
Not applicable to stable.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 19:18:50 +02:00
Marek Olšák 9652bfcf2d radeonsi: implement the simple case of force_persample_interp
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák 214de2d815 radeonsi: move SPI_PS_INPUT_ENA/ADDR registers to a separate state
This will be a derived state used for changing center->sample and
centroid->sample at runtime.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák c23c92c965 radeonsi: only do depth-only or stencil-only in-place decompression
instead of always doing both.
Usually, only depth is needed, so stencil decompression is useless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák cc92b90375 radeonsi: dump buffer lists while debugging
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák 9bd7928a35 radeonsi: add an option for debugging VM faults
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák a9971e85d9 radeonsi: rework uploading border colors
The border colors are uploaded only once when the state is created.

This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.

It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák 228e80123a radeonsi: reorder si_context variables
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák 28b34b474e radeonsi: don't send IB dword usage to si_need_cs_space
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák ec9d5e181e radeonsi: don't count IB space for states, just use an upper bound
Since we don't put any resource descriptors in IBs, the space used by draw
calls is quite small.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák fc95058add radeonsi: convert SPI state to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák 45e549fcbc radeonsi: convert CB_TARGET_MASK setup to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák e21418f221 radeonsi: convert stencil ref state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák c44de30979 radeonsi: convert blend color state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 74aa64876b radeonsi: convert sample mask state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 12b205341a radeonsi: convert clip state into an atom
Reducing calloc overhead.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 0c2eed0ede radeonsi: avoid redundant CB and DB register updates
The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.

Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák c2a42d1f9f radeonsi: don't rebind GSVS ring buffers every draw call using GS
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák a2c6ae07b4 radeonsi: remove the tf_ring state, add the registers to init_config
One less state to worry about.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 0d46c3bc9d radeonsi: remove the gs_rings state, add the registers to init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 87c1e9e19c radeonsi: use a bitmask for tracking dirty atoms
This mainly removes the cache misses when checking the dirty flags.
Not much else though.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák ba7a6cf626 radeonsi: define the state atom array separately
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák 8a97528b3a radeonsi: optimize viewport states
same as scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák f6a10f60b7 radeonsi: optimize scissor states
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák 2c14a6d3b1 radeonsi: add IB tracing support for debug contexts
This adds trace points to all IBs and the parser prints them and also
prints which trace points were reached (executed) by the CP.
This can help pinpoint a problematic packet, draw call, etc.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák 189953ee13 radeonsi: remove old CS tracing code
Some of it is left there and it will be re-used in the next commit.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák be6dc87776 radeonsi: save the contents of indirect buffers for debug contexts
This will be used by the IB parser.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák 110873ed11 radeonsi: add an initial dump_debug_state implementation dumping shaders
This is usually called after a draw call.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Grazvydas Ignotas 3206d4ed44 gallium/radeon: use helper functions to mark atoms dirty
This is analogous to r300_mark_atom_dirty() used by r300, and will
be used by later patches. For common radeon code, appropriate helper
is called through a function pointer.

No functional changes.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-11 14:46:53 +02:00
Marek Olšák 2d3ae154ba radeonsi: move CP DMA functions to their own file
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-31 16:49:17 +02:00
Marek Olšák b0528118df radeonsi: completely rework updating descriptors without CP DMA
The patch has a better explanation. Just a summary here:
- The CPU always uploads a whole descriptor array to previously-unused memory.
- CP DMA isn't used.
- No caches need to be flushed.
- All descriptors are always up-to-date in memory even after a hang, because
  CP DMA doesn't serve as a middle man to update them.

This should bring:
- better hang recovery (descriptors are always up-to-date)
- better GPU performance (no KCACHE and TC flushes)
- worse CPU performance for partial updates (only whole arrays are uploaded)
- less used IB space (no CP_DMA and WRITE_DATA packets)
- simpler code
- hopefully, some of the corruption issues with SI cards will go away.
  If not, we'll know the issue is not here.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-31 16:49:16 +02:00
Marek Olšák 3344699243 radeonsi: set VGT_LS_HS_CONFIG for tessellation
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:33 +02:00
Marek Olšák 74c1001d13 radeonsi: add derived tessellation state
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:33 +02:00
Marek Olšák db267a04ce radeonsi: implement a fixed-function tessellation control shader and its state
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:32 +02:00
Marek Olšák b6f4fdf6a9 radeonsi: set up a ring buffer for tessellation factors
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:32 +02:00
Marek Olšák 59b3556f4c radeonsi: program VGT_SHADER_STAGES_EN for tessellation
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:32 +02:00
Marek Olšák d1f43a7e5b radeonsi: add code for creating, binding and destroying tessellation shaders
This doesn't do anything yet.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:31 +02:00
Marek Olšák 3ce91c727f radeonsi: rework how shader pointers to descriptors are set
This is mainly needed for tessellation where a VS can be bound as VS, ES,
or LS, and TES (tess. evaluationshader) can be bound as VS or ES or neither.
Therefore we need the ability to move pointers to descriptors between
shaders arbitrarily.

The idea is that the context has a mapping from PIPE_SHADER_x to
SPI_SHADER_USER_DATA_x. After a shader is enabled or disabled,
si_shader_change_notify should be called to update this mapping accordingly.

There is a dirty flag for each shader pointer, but only one emit function
for all pointers in the whole context, whose code and logic is separated
from descriptors.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:31 +02:00
Ilia Mirkin a2a1a5805f gallium: replace INLINE with inline
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile

and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability

to remove mentions of the inline define.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2015-07-21 17:52:16 -04:00
Marek Olšák f1be3d8cdd radeonsi: don't flush an empty IB if the only thing we need is a fence
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-07-05 15:08:59 +02:00
Michel Dänzer 56e38edc96 radeonsi: Add CIK SDMA support
Based on the corresponding SI support. Same as that, this is currently
only enabled for one-dimensional buffer copies due to issues with
multi-dimensional SDMA copies.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-06-08 18:13:22 +09:00
Michel Dänzer d64adc3a79 radeonsi: Cache LLVMTargetMachineRef in context instead of in screen
Fixes a crash in genymotion with several threads compiling shaders
concurrently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746

Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-30 15:15:10 +09:00
Marek Olšák dc39413640 radeonsi: move scratch reloc state setup
- move it to its own function
- do it after all states are emitted
- bump SI_MAX_DRAW_CS_DWORDS

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 1f4bb38264 radeonsi: don't emit PA_SC_LINE_STIPPLE after every rasterizer state change
Do it only when the line stipple state is changed.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák f5832f3f9d radeonsi: move PA_SU_SC_MODE_CNTL to rasterizer state
This requires enabling the optional GL provoking vertex behavior for quads.

+ some cosmetic changes, so that the register is set exactly the same as
on r600.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 98a2398222 radeonsi: implement line and polygon smoothing
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 303d23e10d radeonsi: add shader code for smoothing
The fragment shader multiplies the alpha channel with gl_SampleMaskIn.
If blending is enabled, it looks like MSAA.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 4f20a8f278 radeonsi: split sample locations into its own state atom
Sample locations are not updated as often as framebuffers.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:18 +01:00
Marek Olšák 6c5af1dc4e radeonsi: implement polygon stippling
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-04 14:34:13 +01:00
Marek Olšák 1fe7ba8c69 radeonsi: deduce rasterizer primitive type at the beginning of draw_vbo
I will need this for polygon stippling.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-04 14:34:13 +01:00
Marek Olšák b142dd2f24 radeonsi: move the buffer descriptor to the end of the image descriptor
This will allow supporting NULL textures.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-04 14:34:13 +01:00
Tom Stellard 2397a72129 radeonsi: Enable VGPR spilling for all shader types v5
v2:
  - Only emit write SPI_TMPRING_SIZE once per packet.
  - Use context global scratch buffer.

v3:
  - Patch shaders using WRITE_DATA packet instead of map/unmap.
  - Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
    VS_PARTIAL_FLUSH when patching shaders.

v4:
  - Code cleanups.
  - Remove unnecessary multiplies.

v5:
  - Patch shaders in system memory and re-upload to vram.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:47 +00:00
Michel Dänzer 82b7ee62fc Revert "radeonsi: only set BC_OPTIMIZE_DISABLE when necessary"
This reverts commit 0543630d0b.

It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.

We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.

Sorry I didn't remember this when reviewing the reverted change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-15 15:09:48 +09:00
Marek Olšák ca9c5b2be5 radeonsi: improve and fix streamout flushing
- we don't usually need to flush TC L2
- we should flush KCACHE
  (not really an issue now since we always flush KCACHE when updating
   descriptors, but it could be a problem if we used CE, which doesn't
   require flushing KCACHE)
- add an explicit VS_PARTIAL_FLUSH flag

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák 0aecf9e2d1 radeonsi: add a combined flag for flushing a framebuffer
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák 2bfe9d4538 radeonsi: rename flush flags, split the TC flag into L1 and L2
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák d217819e78 r600g,radeonsi: separate cache flush flags
I will rename them for radeonsi.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák 0543630d0b radeonsi: only set BC_OPTIMIZE_DISABLE when necessary
SPI_PS_IN_CONTROL is moved into the SPI mapping state.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák 15a7fff69a radeonsi: remove flatshade from the shader key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák a38e8de643 radeonsi: remove unused and not useful variables
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Marek Olšák 638fa8016a radeonsi: remove init config from states
It really doesn't do anything there.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07 12:06:43 +01:00
Tom Stellard 761e36b4ca radeonsi: Cache LLVMTargetMachine object in si_screen
Rather than building a new one every compile.  This should reduce some
of the overhead of compiling shaders.

One consequence of this change is that we lose the MachineInstrs dumps
when dumping the shaders via R600_DEBUG.  The LLVM IR and assembly is
still dumped, and if you still want to see the MachineInstr dump, you
can run the dumped LLVM IR through llc.
2015-01-06 12:53:21 -08:00
Marek Olšák 3291eedfe6 radeonsi: only emit line stippling and provoking vertex state when it changes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák 834bee42ed radeonsi: emit DRAW_PREAMBLE only if it changes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák 6fde194910 radeonsi: emit GS_OUT_PRIM_TYPE only if it changes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák 34350131de radeonsi: emit primitive restart only if it changes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák 3382036946 radeonsi: emit base vertex and start instance only if they change
v2: added a helper function for invalidation of the sh constants

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák b472709090 radeonsi: emit clip registers only if VS, GS, or rasterizer is changed
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák ca7f1cf8b5 radeonsi: generate derived and draw-related registers directly in the CS
The big function is split into 3 smaller functions.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák c6546cfb03 radeonsi: remove useless variable si_context::pm4_dirty_cdwords
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák 384213cb51 radeonsi: emit draw packets directly into the CS
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-10 21:59:37 +01:00
Marek Olšák 2833dc4e45 radeonsi: don't use pipe_constant_buffer for GS rings
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-24 14:48:02 +02:00
Marek Olšák 2774abd4ce radeonsi: shorten si_pipe_* prefixes to si_*
This was the original naming convention in r600g and it somehow crept
into radeonsi.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-24 14:48:02 +02:00
Marek Olšák dba4c5baf4 radeonsi: move DB_SHADER_CONTROL into db_render_state
I will need this for fixing sample shading with 1 sample.

The good news is that all shader pm4 states no longer use the current context
state, so we can generate the pm4 states outside of draw_vbo if needed.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-24 14:48:02 +02:00
Marek Olšák 884f1654e2 radeonsi: move DB registers from draw_vbo into new db_render_state
It's called db_misc_state in r600g.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-24 14:48:02 +02:00
Marek Olšák d13d2fd161 r600g,radeonsi: add debug option which forces DMA for copy_region and blit 2014-09-12 22:51:28 +02:00
Marek Olšák a10c8db715 radeonsi: implement EXPCLEAR optimization for depth
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:52 +02:00
Marek Olšák 573313c94e radeonsi: implement fast depth clear
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:51 +02:00
Marek Olšák 63cb4077e6 radeonsi: move DB_RENDER_CONTROL into draw_vbo
So that I can add fast depth clear.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-09-01 21:18:51 +02:00
Marek Olšák 87a8ed9389 radeonsi: fix buffer invalidation of unbound texture buffer objects
This maintains a list of all TBOs in a pipe_context.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-08-14 20:45:03 +02:00
Marek Olšák 6210d6fdc2 radeonsi: remove nr_vertex_buffers
Unused.

Also inline util_set_vertex_buffers_count and simplify it.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-07-18 01:58:59 +02:00
Marek Olšák 0ed0bf0696 radeonsi: move vertex buffer descriptors from IB to memory
This removes the intermediate storage (pm4 state) and generates descriptors
directly in a staging buffer.

It also reduces the number of flushes, because the descriptors no longer
take CS space.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-07-18 01:58:59 +02:00
Marek Olšák 1635ded828 radeonsi: add support for fine-grained sampler view updates
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-07-18 01:58:59 +02:00
Marek Olšák dd46841bc9 radeonsi: move sampler descriptors from IB to memory
Sampler descriptors are now represented by si_descriptors.
This also adds support for fine-grained sampler state updates and
the border color update is now isolated in a separate function.

Border colors have been broken if texturing from multiple shader stages is
used. This patch doesn't change that.

BTW, blitting already makes use of fine-grained state updates.
u_blitter uses 2 textures at most, so we only have to save 2.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-07-18 01:58:59 +02:00
Marek Olšák a66d934139 radeonsi: assume LLVM 3.4.2 is always present
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-07-18 01:58:58 +02:00
Marek Olšák ee2a818d33 radeonsi: rename definitions of shader limits
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-07-11 19:36:29 +02:00
Marek Olšák 501fee2511 radeonsi: implement set_min_samples
This is how per-sample shading is enabled.
2014-06-02 12:58:22 +02:00
Tom Stellard 93c2ebbd83 radeonsi: Enable geometry shaders with LLVM 3.4.1
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
2014-05-09 12:16:05 -04:00
Adam Jackson 74388dd24b radeonsi: Don't use anonymous struct trick in atom tracking
I'm somewhat impressed that current gccs will let you do this, but
sufficiently old ones (including 4.4.7 in RHEL6) won't.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2014-05-08 12:05:58 -04:00
Marek Olšák d4edc60767 radeonsi: merge si_flush with si_context_flush
This also removes si_flush_gfx_ring.

Reviewed-by: Christian König <christian.koenig@amd.com>
2014-04-16 14:02:51 +02:00
Marek Olšák 70cf6639c3 gallium/radeon: create and return a fence in the flush function
All flush functions get a fence parameter. cs_create_fence is removed.

Reviewed-by: Christian König <christian.koenig@amd.com>
2014-04-16 14:02:51 +02:00
Niels Ole Salscheider 71254732db radeonsi: Implement DMA blit
This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.

I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.

v2: Marek> removed gfx.flush in si_dma_copy_tile

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2014-03-20 17:21:16 +01:00
Marek Olšák f549129564 r600g, radeonsi: fix primitives-generated query with disabled streamout
Buffers are disabled by VGT_STRMOUT_BUFFER_CONFIG, but the query only works
if VGT_STRMOUT_CONFIG.STREAMOUT_0_EN is enabled.

This moves VGT_STRMOUT_CONFIG to its own state. The register is set to 1
if either streamout or the primitives-generated query is enabled.

However, the primitives-emitted query is also incremented, so it's disabled
by setting VGT_STRMOUT_BUFFER_SIZE to 0 when there is no buffer bound.

This fixes piglit:
  ARB_transform_feedback2/counting with pause
  EXT_transform_feedback/primgen-query transform-feedback-disabled

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-03-11 18:51:20 +01:00
Marek Olšák a38e1fd78b radeonsi: implement fast color clear
This works for both multi-sample and single-sample color buffers.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-03-11 18:51:20 +01:00
Marek Olšák 61a2fac199 radeonsi: convert the framebuffer state to atom-based
This looks like r600g. The shared Cayman MSAA code is used here.

The real motivation for this is that I need the ability to change values
of color registers after the framebuffer state is set. The PM4 state cannot
be modified easily after it's generated. With this, I can just change
r600_surface::cb_color_xxx and set framebuffer.atom.dirty=true and it's done.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-03-11 18:51:20 +01:00
Marek Olšák 6a5499b9d9 radeonsi: move framebuffer-related state to a new struct si_framebuffer
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-03-11 18:51:20 +01:00
Marek Olšák 40b9812a76 r600g,radeonsi: share r600_surface
I'm gonna use this in radeonsi.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:08 +01:00
Michel Dänzer f8e16010e5 radeonsi: Put GS ring buffer descriptors with streamout buffer descriptors
And mark the constant buffers as read only for the GPU again.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-29 11:09:26 +09:00
Michel Dänzer b4e14931a9 radeonsi: Pass VS resource descriptors to the HW ES shader stage as well
This makes sure constants and samplers work in the vertex shader even
when a geometry shader is active.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-29 11:08:43 +09:00
Michel Dänzer 404b29d765 radeonsi: Initial geometry shader support
Partly based on the corresponding r600g work by Vadim Girlin and Dave
Airlie.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-29 11:06:28 +09:00
Marek Olšák 7209703432 radeonsi: cleanup includes, add missing license
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:40:13 +01:00
Marek Olšák 8a4d7c296f radeonsi: move some inline functions from si_pipe.h to si_state.c
And si_tex_aniso_filter is unused.

v2: remove INLINE occurences

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:40:05 +01:00
Marek Olšák 530348680a radeonsi: remove si_resource.h
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:40:04 +01:00
Marek Olšák 6e38a3de8a radeonsi: remove si.h
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:40:02 +01:00
Marek Olšák 9f5c037ab9 radeonsi: inline si_translate_index_buffer
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:39:57 +01:00
Marek Olšák 0932f0ff14 radeonsi: inline si_upload_index_buffer
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:39:53 +01:00
Marek Olšák 65dc588bfd r600g,radeonsi: consolidate get_compute_param
v2: added fprintf to r600_get_llvm_processor_name

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:39:48 +01:00
Marek Olšák a4c218f398 r600g,radeonsi: consolidate variables for CS tracing
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:39:42 +01:00
Marek Olšák a9ae7635b7 r600g,radeonsi: consolidate the contents of r600_resource.c
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:39:25 +01:00
Marek Olšák 62d55c0a2d radeonsi: use queries from r600g
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-01-28 01:39:10 +01:00
Andreas Hartmetz 8662e66bf2 radeonsi: Rename the commonly occurring rctx/r600 variables.
The "r" stands for R600.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-14 00:07:14 +01:00
Andreas Hartmetz 0b57fc15e1 radeonsi: Rename R600->SI in some remaining defines.
I had previously considered that unsafe.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-14 00:07:13 +01:00
Andreas Hartmetz 45578def71 radeonsi: Rename r600->si for functions in si_pipe.h.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-14 00:07:13 +01:00
Andreas Hartmetz 238aeabce0 radeonsi: Rename r600->si for structs in si_pipe.h.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-14 00:07:13 +01:00
Andreas Hartmetz 786af2f963 radeonsi: Apply si_* file naming scheme.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-01-14 00:07:13 +01:00
Renamed from src/gallium/drivers/radeonsi/radeonsi_pipe.h (Browse further)