Commit Graph

124 Commits

Author SHA1 Message Date
Marek Olšák 1a24f443b4 radeonsi: implement fast stencil clear
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Nicolai Hähnle ad22006892 radeonsi: implement AMD_performance_monitor for CIK+
Expose most of the performance counter groups that are exposed by Catalyst.
Ideally, the driver will work with GPUPerfStudio at some point, but we are not
quite there yet. In any case, this is the reason for grouping multiple
instances of hardware blocks in the way it is implemented.

The counters can also be shown using the Gallium HUD. If one is interested to
see how work is distributed across multiple shader engines, one can set the
environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained performance
counter groups.

Part of the implementation is in radeon because an implementation for
older hardware would largely follow along the same lines, but exposing
a different set of blocks which are programmed slightly differently.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:52:09 +01:00
Marek Olšák b1c5f3faa9 radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.

There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.

This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák 3d963abc81 radeonsi: prevent recursion in si_context_gfx_flush
The recursion can only occur if you modify need_cs_space to always flush.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák c6012a6650 radeonsi: rename cache flushing flags once more
KCACHE, TC L1 and TC L2 are renamed to:
- SMEM L1
- VMEM L1
- GLOBAL L2

You can easily tell what they are used for now.
Shaders must deal with coherency issues between both L1s manually,
e.g. by setting GLC=1 or by using s_dcache_*.

BOTH_ICACHE_KCACHE was an unused definition.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Bas Nieuwenhuizen 48b5f104ac radeonsi: Enable DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:30 +02:00
Marek Olšák 06083046a4 radeonsi: add another requirement for PARTIAL_ES_WAVE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák 07b3cc6ecf radeonsi: allow unbinding pixel shaders and remove the dummy shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák 9b54ce3362 radeonsi: support thread-safe shaders shared by multiple contexts
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.

The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.

This is also a prerequisite for multithreaded shader compilation.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:51:51 +02:00
Marek Olšák 13e69805ea radeonsi: fix a GS hang on VI
Broken by one of the cleanups: 0d46c3bc9d
Not applicable to stable.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 19:18:50 +02:00
Marek Olšák 9652bfcf2d radeonsi: implement the simple case of force_persample_interp
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák 214de2d815 radeonsi: move SPI_PS_INPUT_ENA/ADDR registers to a separate state
This will be a derived state used for changing center->sample and
centroid->sample at runtime.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák c23c92c965 radeonsi: only do depth-only or stencil-only in-place decompression
instead of always doing both.
Usually, only depth is needed, so stencil decompression is useless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák cc92b90375 radeonsi: dump buffer lists while debugging
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák 9bd7928a35 radeonsi: add an option for debugging VM faults
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák a9971e85d9 radeonsi: rework uploading border colors
The border colors are uploaded only once when the state is created.

This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.

It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák 228e80123a radeonsi: reorder si_context variables
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák 28b34b474e radeonsi: don't send IB dword usage to si_need_cs_space
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák ec9d5e181e radeonsi: don't count IB space for states, just use an upper bound
Since we don't put any resource descriptors in IBs, the space used by draw
calls is quite small.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák fc95058add radeonsi: convert SPI state to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák 45e549fcbc radeonsi: convert CB_TARGET_MASK setup to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák e21418f221 radeonsi: convert stencil ref state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák c44de30979 radeonsi: convert blend color state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 74aa64876b radeonsi: convert sample mask state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 12b205341a radeonsi: convert clip state into an atom
Reducing calloc overhead.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 0c2eed0ede radeonsi: avoid redundant CB and DB register updates
The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.

Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák c2a42d1f9f radeonsi: don't rebind GSVS ring buffers every draw call using GS
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák a2c6ae07b4 radeonsi: remove the tf_ring state, add the registers to init_config
One less state to worry about.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 0d46c3bc9d radeonsi: remove the gs_rings state, add the registers to init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák 87c1e9e19c radeonsi: use a bitmask for tracking dirty atoms
This mainly removes the cache misses when checking the dirty flags.
Not much else though.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák ba7a6cf626 radeonsi: define the state atom array separately
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák 8a97528b3a radeonsi: optimize viewport states
same as scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák f6a10f60b7 radeonsi: optimize scissor states
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák 2c14a6d3b1 radeonsi: add IB tracing support for debug contexts
This adds trace points to all IBs and the parser prints them and also
prints which trace points were reached (executed) by the CP.
This can help pinpoint a problematic packet, draw call, etc.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák 189953ee13 radeonsi: remove old CS tracing code
Some of it is left there and it will be re-used in the next commit.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák be6dc87776 radeonsi: save the contents of indirect buffers for debug contexts
This will be used by the IB parser.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák 110873ed11 radeonsi: add an initial dump_debug_state implementation dumping shaders
This is usually called after a draw call.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Grazvydas Ignotas 3206d4ed44 gallium/radeon: use helper functions to mark atoms dirty
This is analogous to r300_mark_atom_dirty() used by r300, and will
be used by later patches. For common radeon code, appropriate helper
is called through a function pointer.

No functional changes.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-11 14:46:53 +02:00
Marek Olšák 2d3ae154ba radeonsi: move CP DMA functions to their own file
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-31 16:49:17 +02:00
Marek Olšák b0528118df radeonsi: completely rework updating descriptors without CP DMA
The patch has a better explanation. Just a summary here:
- The CPU always uploads a whole descriptor array to previously-unused memory.
- CP DMA isn't used.
- No caches need to be flushed.
- All descriptors are always up-to-date in memory even after a hang, because
  CP DMA doesn't serve as a middle man to update them.

This should bring:
- better hang recovery (descriptors are always up-to-date)
- better GPU performance (no KCACHE and TC flushes)
- worse CPU performance for partial updates (only whole arrays are uploaded)
- less used IB space (no CP_DMA and WRITE_DATA packets)
- simpler code
- hopefully, some of the corruption issues with SI cards will go away.
  If not, we'll know the issue is not here.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-31 16:49:16 +02:00
Marek Olšák 3344699243 radeonsi: set VGT_LS_HS_CONFIG for tessellation
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:33 +02:00
Marek Olšák 74c1001d13 radeonsi: add derived tessellation state
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:33 +02:00
Marek Olšák db267a04ce radeonsi: implement a fixed-function tessellation control shader and its state
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:32 +02:00
Marek Olšák b6f4fdf6a9 radeonsi: set up a ring buffer for tessellation factors
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:32 +02:00
Marek Olšák 59b3556f4c radeonsi: program VGT_SHADER_STAGES_EN for tessellation
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:32 +02:00
Marek Olšák d1f43a7e5b radeonsi: add code for creating, binding and destroying tessellation shaders
This doesn't do anything yet.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:31 +02:00
Marek Olšák 3ce91c727f radeonsi: rework how shader pointers to descriptors are set
This is mainly needed for tessellation where a VS can be bound as VS, ES,
or LS, and TES (tess. evaluationshader) can be bound as VS or ES or neither.
Therefore we need the ability to move pointers to descriptors between
shaders arbitrarily.

The idea is that the context has a mapping from PIPE_SHADER_x to
SPI_SHADER_USER_DATA_x. After a shader is enabled or disabled,
si_shader_change_notify should be called to update this mapping accordingly.

There is a dirty flag for each shader pointer, but only one emit function
for all pointers in the whole context, whose code and logic is separated
from descriptors.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-07-23 00:59:31 +02:00
Ilia Mirkin a2a1a5805f gallium: replace INLINE with inline
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile

and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability

to remove mentions of the inline define.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2015-07-21 17:52:16 -04:00
Marek Olšák f1be3d8cdd radeonsi: don't flush an empty IB if the only thing we need is a fence
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-07-05 15:08:59 +02:00
Michel Dänzer 56e38edc96 radeonsi: Add CIK SDMA support
Based on the corresponding SI support. Same as that, this is currently
only enabled for one-dimensional buffer copies due to issues with
multi-dimensional SDMA copies.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-06-08 18:13:22 +09:00