Commit Graph

159 Commits

Author SHA1 Message Date
Bas Nieuwenhuizen fee3160af9 radeonsi: Enable dynamic HS.
This allows running the TES on different CU's than the
TCS which results in performance improvements.

v2: Only write the control word from one invocation.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen 6217716e8f radeonsi: Store inputs to memory when not using a TCS.
We need to copy the VS outputs to memory. I decided to do this
using a shader key, as the value depends on other shaders.

I also switch the fixed function TCS over to monolithic, as
otherwisze many of the user SGPR's need to be passed to the
epilog, which increases register pressure, or complexity to
avoid that. The main body of the fixed function TCS is not
that interesting to precompile anyway, since we do it on
demand and it is very small.

v2: Use u_bit_scan64.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen 5c34562d7c radeonsi: Add offchip tessellation parameters.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen d27ff7d683 radeonsi: Add buffer for offchip storage between TCS and TES.
The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.

v2: - Use the correct register for SI.
    - Add define for block size.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Axel Davy fc3533c088 radeonsi: Change default behaviour for undefined COLOR0
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-18 23:37:14 +02:00
Nicolai Hähnle d8f3e8e626 radeonsi: always allocate export memory for pixel shaders
Experiments with framebuffer-no-attachments type draw calls have shown that
NULL exports stall terribly unless we ensure that export memory is allocated
by the SPI.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-09 11:52:46 -05:00
Nicolai Hähnle b9e6e8e7d4 radeonsi: fix undefined behavior (memcpy arguments must be non-NULL)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:46:59 -05:00
Marek Olšák c9e5a7df61 gallium: remove helpers converting to/from TGSI_PROCESSOR_*
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Marek Olšák af249a7da9 gallium: use PIPE_SHADER_* everywhere, remove TGSI_PROCESSOR_*
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Marek Olšák 8cfc4cf76d radeonsi: remove the shader parameter from si_set_ring_buffer
not used anymore

this is a follow-up to the RW buffer cleanup.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-22 01:14:14 +02:00
Marek Olšák 3138a28ff2 radeonsi: move default tess level constant buffer to RW buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Marek Olšák 1378487fb4 radeonsi: rename and rearrange RW buffer slots
- use an enum
- use a unique slot number regardless of the shader stage
  (the per-stage slots will go away for RW buffers)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Bas Nieuwenhuizen 38f4cee3ff radeonsi: Add config parameter to si_shader_apply_scratch_relocs.
shader->config is not updated for compute kernels.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-04-21 19:36:19 +02:00
Marek Olšák ffe44d0283 radeonsi: fold num_user_sgprs where it is possible
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák 51c4034f9b radeonsi: fix SGPRS calculation once more
This fixes GS piglit failures after adding SI_PARAM_SHADER_BUFFERS,
which bumped NUM_USER_SGPRS and uncovered this bug on SI.

If this was fixed in LLVM, these workarounds wouldn't be needed.

LLVM would have to look at the calling convention to know how many SGPR
inputs are declared, and add VCC and the scratch wave offset (which is
enabled even if we spill SGPRs but not VGPRs, oh well).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák 2ca5566ed7 radeonsi: move scissor and viewport states into gallium/radeon
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:13:24 +02:00
Nicolai Hähnle 6f942ac5ee radeonsi: disable early Z if the fragment shader writes to memory
Empirically, both the EXEC_ON_* flags and LATE_Z are necessary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Marek Olšák a73a657def radeonsi: process TGSI property NEXT_SHADER
This allows compiling the main shader part as ES or LS.

If we get the correct hint, non-separable GLSL shaders no longer have to be
compiled as VS first, followed by LS or ES compiled on demand.

The result is that fewer shaders are compiled by piglit, but it doesn't
improve piglit running time.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-19 23:20:01 +01:00
Nicolai Hähnle 4de25fa7b0 radeonsi: set DEPTH_BEFORE_SHADER based on FS_EARLY_DEPTH_STENCIL
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:59 -05:00
Marek Olšák d0f3b524cd radeonsi: use re-Z
This can increase perf for shaders that kill pixels (kill, alpha-test,
alpha-to-coverage).

v2: add comments

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-01 00:18:19 +01:00
Marek Olšák ff360a52e6 radeonsi: implement binary shaders & shader cache in memory (v2)
v2: handle _mesa_hash_table_insert failure
    other cosmetic changes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák 1fe73d55e3 radeonsi: move some struct si_shader members to new struct si_shader_info
This will be part of shader binaries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák 3c98e0b369 radeonsi: compile non-GS middle parts of shaders immediately if enabled
Still disabled.

Only prologs & epilogs are compiled in draw calls, but each variant of those
is compiled only once per process.

VS is always compiled as hw VS.
TES is always compiled as hw VS.

LS and ES stages are always compiled on demand.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák 4636d9be4a radeonsi: add PS prolog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák dc27456194 radeonsi: separate out shader key bits for prologs & epilogs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák 329181ae33 radeonsi: enable denorms for 64-bit and 16-bit floats
This fixes FP16 conversion instructions for VI, which has 16-bit floats,
but not SI & CI, which can't disable denorms for those instructions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák de2e28366a radeonsi: compile geometry shaders immediately
they have only 1 variant

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák f7a8b6fff5 radeonsi: split out code for deleting si_shader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák b6d5666fbf radeonsi: remove useless code that handles dx10_clamp_mode
"enable-no-nans-fp-math" is a wrong string and there was a disagreement
about fixing it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák 5a53628f45 radeonsi: read SPI_PS_INPUT_ADDR from LLVM if it returns it
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák b9126dcda8 radeonsi: implement forcing per-sample_interpolation using the shader key only
It was partly a state and partly emulated by shader code, but since we want
to do this in a fragment shader prolog, we need to put it into the shader
key, which will be used to generate the prolog.

This also removes the spi_ps_input states and moves the registers
to the PS state.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák 6dda2455c8 radeonsi: move BCOLOR PS input locations after all other inputs
BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs
were offset by 1 if color_two_side was enabled, and not offset if it was not
enabled, which is a variation that's problematic if we want to have 1 variant
per shader and the variant doesn't care about color_two_side (that should be
handled by other bytecode attached at the beginning).

Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location
"num_inputs" if it's present. BCOLOR1 is next.

This also allows removing si_shader::nparam and
si_shader::ps_input_param_offset, which are useless now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák 606e4185f3 radeonsi: move SPI_PS_INPUT_CNTL value computation to a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák 90cbbe1c12 radeonsi: generate a color_two_side variant only if the shader reads colors
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák 0d68b91220 radeonsi: rework RB+ for Stoney
This fixes it.

States which also need to be taken into account:
- SPI color formats - each down-conversion format supports only a limited set
  of SPI formats
- whether MSAA resolving and logic op are enabled

These need special handling:
- blending
- disabled channels

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-02 21:03:19 +01:00
Marek Olšák 066d76c2f4 radeonsi: rename cb_target_mask state to cb_render_state
and rename a variable in the function.

SX_PS_DOWNCONVERT will be emitted here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-02 21:03:19 +01:00
Marek Olšák af57507e4f radeonsi: fix shader precompilation for shader-db
The addition of spi_shader_col_format killed all color outputs
in precompiled shaders.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)

v2: also set the alpha func (trivial)
2016-01-26 18:49:50 +01:00
Nicolai Hähnle a7754ffd31 radeonsi: replace use of is_gs_copy_shader in si_shader_vs
We now have an explicit parameter that contains the same information, and
this will allow us to get rid of is_gs_copy_shader in the si_shader struct.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:15:55 -05:00
Nicolai Hähnle 004fcd4230 radeonsi: ensure that VGT_GS_MODE is sent when necessary
Specifically, when the API switches from using a GS to not using a GS and then
back to using the same GS again, we do not have to re-send all the GS state,
but we do have to send VGT_GS_MODE. So make VGT_GS_MODE consistently be a part
of the VS state.

This fixes a rendering bug in Dolphin, but surely other applications are
affected as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93648
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:15:31 -05:00
Nicolai Hähnle 9f89bd69df radeonsi: extract the VGT_GS_MODE calculation into its own function
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:15:08 -05:00
Nicolai Hähnle d76bd85c35 Revert "radeonsi: fix discard-only fragment shaders (v2)"
This reverts commit 843855bbf0.

It became redundant due to Marek's earlier pushed 8667a1ae which achieves
the same thing.
2016-01-22 12:40:26 -05:00
Nicolai Hähnle 843855bbf0 radeonsi: fix discard-only fragment shaders (v2)
When a fragment shader is used that has no outputs but does conditional
discard (KILL_IF), all fragments are killed without this patch.

By comparing various register settings, my conclusion is that the exec mask
is either not properly forwarded to the DB by NULL exports or ends up being
unused, at least when there is _only_ a NULL export (the ISA documentation
claims that NULL exports can be used to override a previously exported exec
mask).

Of the various approaches I have tried to work around the problem, this one
seems to be the least invasive one.

v2: take discard by alpha test into account as well

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93761
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-22 11:59:50 -05:00
Marek Olšák 99dfeb01bd radeonsi: disable SPI color outputs the shader doesn't write
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák f6360de8c0 radeonsi: use all SPI color formats
because not using SPI_SHADER_32_ABGR doubles fill rate.

We should also get optimal performance if alpha isn't needed or blending
isn't enabled.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák 933e3c4145 radeonsi: use 32_AR for alpha-to-coverage without a color buffer
This avoids the fp16 packing instructions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák e28b8530b9 radeonsi: set CB_SHADER_MASK according to SPI color formats
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák 8667a1aea2 radeonsi: use SPI_SHADER_COL_FORMAT fields instead of export_16bpc
This does change the behavior slightly:
  If a shader writes COLOR[i] and that color buffer isn't bound,
  the shader will export MRT_NULL instead and discard the IR tree that
  calculates the output. The only exception is alpha-to-coverage, which
  requires an alpha export.

v2: - update a comment about 16BPC
    - account for MRTZ when when fixing alpha-test/kill

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák dc96a18d24 radeonsi: don't miss changes to SPI_TMPRING_SIZE
I'm not sure about the consequences of this bug, but it's definitely
dangerous.

This applies to SI, CIK, VI.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-14 19:55:41 +01:00
Marek Olšák 4ea0febcb0 radeonsi: move POSITION and FACE fragment shader inputs to system values
And FACE becomes integer instead of float.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-13 12:27:28 +01:00
Marek Olšák caf3c2abea radeonsi: simplify gl_FragCoord behavior
It will become a system value, not an input.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-13 12:27:28 +01:00