Commit Graph

469 Commits

Author SHA1 Message Date
Sonny Jiang ce1d72609d radeonsi:optimizing SET_CONTEXT_REG for shaders Tessellation
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-10-05 19:04:13 -04:00
Sonny Jiang 4de328da07 radeonsi:optimizing SET_CONTEXT_REG for shaders PS
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-10-05 19:04:13 -04:00
Sonny Jiang f243980f2c radeonsi:optimizing SET_CONTEXT_REG for shaders VS
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-10-05 19:04:13 -04:00
Sonny Jiang 4052624398 radeonsi:optimizing SET_CONTEXT_REG for shaders GS
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-10-05 19:04:13 -04:00
Sonny Jiang eeb9170599 radeonsi: optimizing SET_CONTEXT_REG for shaders ES
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-10-05 17:53:52 -04:00
Marek Olšák 0b062f0419 radeonsi: don't set the VS prolog key for the blit VS 2018-10-02 12:21:49 -04:00
Marek Olšák 5693ca865d radeonsi: bump MAX_GS_INVOCATIONS
same as the closed driver

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-08-23 16:56:17 -04:00
Marek Olšák ac72a6bd0b radeonsi: move internal TGSI shaders into si_shaderlib_tgsi.c
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-14 21:20:31 -04:00
Marek Olšák 86b52d4236 radeonsi: reduce LDS stalls by 40% for tessellation
40% is the decrease in the LGKM counter (which includes SMEM too)
for the GFX9 LSHS stage.

This will make the LDS size slightly larger, but I wasn't able to increase
the patch stride without corruption, so I'm increasing the vertex stride.
2018-07-23 20:23:52 -04:00
Sonny Jiang c6737756ad radeonsi: emit_spi_map packets optimization
v2: marek: remove an empty line before break;
    rename reg_val_seq -> spi_ps_input_cntl
    "type * x" -> "type *x"

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-07-20 13:50:26 -04:00
Dave Airlie 0eb65b4944 radeonsi: rename si_compiler -> ac_llvm_compiler
As precursor to moving init to common code, just rename the struct
and move it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:31:32 +10:00
Marek Olšák 1542169a4a radeonsi: enable shader caching for compute shaders
Compute shaders were not using the shader cache.
2018-06-28 22:27:25 -04:00
Marek Olšák d13f240269 radeonsi: unify duplicated code for initial shader compilation 2018-06-28 22:27:25 -04:00
Marek Olšák f154555733 radeonsi: clean up passing the is_monolithic flag for compilation
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák 6703fec58c amd,radeonsi: rename radeon_winsys_cs -> radeon_cmdbuf
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-19 13:08:50 -04:00
Marek Olšák 22e994bb75 radeonsi: assume that rasterizer state is non-NULL in draw_vbo
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:36 -04:00
Marek Olšák f3b3ee6974 radeonsi: micro-optimize prim checking and fix guardband with lines+adjacency
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:34 -04:00
Marek Olšák 28ee825e19 radeonsi: move VGT_GS_OUT_PRIM_TYPE into si_shader_gs
same as amdvlk.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:23 -04:00
Marek Olšák 99e0ba6868 radeonsi: record CLIPVERTEX output usage properly for compatibility profiles
This was missed when adding CLIPVERTEX support into GS & tess.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:20 -04:00
Marek Olšák 2f65c67043 radeonsi: fix passing gl_ClipVertex for GS and tess
Also add the fprintf call.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:46:00 -04:00
Marek Olšák a7d61c0753 radeonsi: fix color inputs/outputs for GS and tess
GS is tested, tessellation is untested.

Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:46:00 -04:00
Marek Olšák 92ea9329e5 radeonsi: fix incorrect parentheses around VS-PS varying elimination
I don't know if it caused issues.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:46:00 -04:00
Marek Olšák 07e02c8617 radeonsi: round ps_iter_samples in set_min_samples
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:57 -04:00
Marek Olšák 87eb597758 radeonsi: add struct si_compiler containing LLVMTargetMachineRef
It will contain more variables.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 6fadfc01c6 radeonsi: use r600_resource() typecast helper
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 3160ee876a radeonsi: remove unused atom parameter from si_atom::emit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák e395475096 radeonsi: remove function si_init_atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 639b673fc3 radeonsi: don't use an indirect table for state atoms
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 9054799b39 radeonsi: rename r600_atom -> si_atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 60299e9abe radeonsi: don't emit partial flushes for internal CS flushes only
Tested-by: Benedikt Schemmer <ben@besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-16 16:58:10 -04:00
Marek Olšák 6a93441295 radeonsi: remove r600_common_context
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 5777488406 radeonsi: move r600_cs.h contents into si_pipe.h, si_build_pm4.h
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 72e9e98076 radeonsi: move and rename R600_ERR out of r600_pipe_common.h
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 5f1cddde78 radeonsi: move definitions out of r600_pipe_common.h
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák c424f86180 radeonsi: use si_context instead of pipe_context in parameters pt1
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 4c5efc40f4 radeonsi: update copyrights
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 95bc30275b radeonsi: switch radeon_add_to_buffer_list parameter to si_context
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 2b70dd8c8a radeonsi: flatten / remove struct r600_ring
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 17e8f1608e radeonsi: call CS flush functions directly whenever possible
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 0669dca9c0 radeonsi: skip DCC render feedback checking if color writes are disabled 2018-04-05 15:34:58 -04:00
Marek Olšák 2be6143032 radeonsi: implement GL_KHR_blend_equation_advanced
MSAA is supported using sample shading. Layered rendering and all texture
targets are also supported.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-02 13:55:25 -04:00
Marek Olšák 9b7db12815 radeonsi: remove chip_class parameter from si_lower_nir
We can get it from si_screen.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-08 14:58:16 -05:00
Marek Olšák 2e30268877 radeonsi: mask out high VM address bits in registers where needed 2018-03-07 13:55:35 -05:00
Timothy Arceri 70190a6567 radeonsi/nir: call ac_lower_indirect_derefs()
Fixes piglit tests:
tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec3-index-rd.shader_test
tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-05 14:09:23 +11:00
Timothy Arceri 561503e3bd radeonsi: add chip class to compiler_ctx_state
This will be used in the following patch.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-05 14:09:23 +11:00
Marek Olšák 8799eaed99 radeonsi: remove 2 unused user SGPRs from merged TES-GS with 32-bit pointers
The effect of the last 13 commits on user SGPR counts:

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-26 12:01:19 +01:00
Marek Olšák 3fa7a59d69 radeonsi: make SI_SGPR_VERTEX_BUFFERS the last user SGPR input
so that it can be removed and replaced with inline VBO descriptors,
and the pointer can be packed in unused bits of VBO descriptors.
This also removes the pointer from merged TES-GS where it's useless.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-26 12:01:08 +01:00
Marek Olšák 2d03c4cac8 radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRs
TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address
aligned to 512KB. Hey, it's a 13-bit pointer!

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:29 +01:00
Marek Olšák fca7dee9c6 radeonsi: put both tessellation rings into 1 buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:28 +01:00
Marek Olšák d2963d8b5f radeonsi: move tessellation ring info into si_screen
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:28 +01:00
Timothy Arceri 691c320de0 radeonsi: add nir shader cache support
In future we might want to try avoid calling nir_serialize() but
this works for now.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-20 13:15:02 +11:00
Timothy Arceri 2b431808ab radeonsi: rename variables tgsi_binary -> ir_binary
This better represents that the ir could be either tgsi or nir.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-20 13:15:02 +11:00
Marek Olšák fdf01d0244 radeonsi: remove DBG_PRECOMPILE
it's useless and shader-db stats only report the main shader part.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-31 03:21:20 +01:00
Marek Olšák 148b48646b radeonsi: print shader-db stats for main parts, not final binaries
This is needed to get shader-db stats for LS,HS,ES,GS stages on gfx9.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-31 03:21:20 +01:00
Timothy Arceri 452586b56a radeonsi: add dummy implementation of si_nir_scan_tess_ctrl()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Józef Kucia f222cf3c6d radeonsi: fix alpha-to-coverage if color writes are disabled
If alpha-to-coverage is enabled, we have to compute alpha
even if color writes are disabled.

Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-01-04 01:58:33 +01:00
Samuel Pitoiset 79b34d0832 amd/common: add ac_vgt_gs_mode() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-18 11:50:50 +01:00
Samuel Pitoiset 55f8431c76 amd/common: add ac_get_cb_shader_mask() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-18 11:50:48 +01:00
Samuel Pitoiset 45872a0a6d radeonsi: make use of ac_get_spi_shader_z_format()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:23:25 +01:00
Marek Olšák 2c5f2936af r300,r600,radeonsi: replace RADEON_FLUSH_* with PIPE_FLUSH_*
and handle PIPE_FLUSH_HINT_FINISH in r300.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-29 18:21:30 +01:00
Marek Olšák 950221f923 radeonsi: remove r600_common_screen
Most files in gallium/radeon now include si_pipe.h.

chip_class and family are now here:
    sscreen->info.family
    sscreen->info.chip_class

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-29 18:21:30 +01:00
Marek Olšák 2208b760f3 radeonsi: move shader debug helpers out of r600_pipe_common.c
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-29 18:21:30 +01:00
Marek Olšák c63e225bff radeonsi: remove some definitions and helpers from r600_pipe_common.h
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-29 18:21:30 +01:00
Nicolai Hähnle f76a6cb337 radeonsi: always use async compiles when creating shader/compute states
With Gallium threaded contexts, creating shader/compute states is
effectively a screen operation, so we should not use context state.

In particular, this allows us to avoid using the context's LLVM
TargetMachine.

This isn't an issue yet because u_threaded_context filters out non-async
debug callbacks, and we disable threaded contexts for debug contexts.
However, we may want to change that in the future.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:20 +01:00
Nicolai Hähnle dd7c273e87 radeonsi: move pipe debug callback to si_context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:19 +01:00
Nicolai Hähnle 0f54ee6072 radeonsi: reduce the scope of sel->mutex in si_shader_select_with_key
We only need the lock to guard changes in the variant linked list. The
actual compilation can happen outside the lock, since we use the ready
fence as a guard.

v2: fix double-unlock

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:37:51 +01:00
Nicolai Hähnle 4f493c79ee radeonsi: use ready fences on all shaders, not just optimized ones
There's a race condition between si_shader_select_with_key and
si_bind_XX_shader:

  Thread 1                         Thread 2
  --------                         --------
  si_shader_select_with_key
    begin compiling the first
    variant
    (guarded by sel->mutex)
                                   si_bind_XX_shader
                                     select first_variant by default
                                     as state->current
                                   si_shader_select_with_key
                                     match state->current and early-out

Since thread 2 never takes sel->mutex, it may go on rendering without a
PM4 for that shader, for example.

The solution taken by this patch is to broaden the scope of
shader->optimized_ready to a fence shader->ready that applies to
all shaders. This does not hurt the fast path (if anything it makes
it faster, because we don't explicitly check is_optimized).

It will also allow reducing the scope of sel->mutex locks, but this is
deferred to a later commit for better bisectability.

Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:37:51 +01:00
Marek Olšák 529cdce799 radeonsi: remove 'Authors:' comments
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-02 18:19:03 +01:00
Marek Olšák da0083f123 radeonsi: use postponed KILL only when derivatives are used
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-24 14:56:34 +02:00
Marek Olšák 65f2e33500 radeonsi: import r600_streamout from drivers/radeon
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-09 16:26:55 +02:00
Marek Olšák 3784ce9782 radeonsi: enumerize DBG flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-09 16:20:16 +02:00
Marek Olšák 5a47abb63e radeonsi: don't change viewport for blits, use window-space positions
The viewport state was an identity anyway.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-07 18:26:35 +02:00
Marek Olšák 13b6c1c031 radeonsi: minor cleanup of si_update_vs_writes_viewport_index
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-07 18:26:35 +02:00
Marek Olšák 69ccb9dae7 radeonsi: use new VS blit shaders (VS inputs in SGPRs)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-07 18:26:35 +02:00
Marek Olšák 6a8401a94e radeonsi: add VS blit shader creation
no users yet

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-07 18:26:35 +02:00
Nicolai Hähnle 12f3155e28 radeonsi: simplify the signature of si_update_vs_writes_viewport_index
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:45 +02:00
Nicolai Hähnle 7bbcb6ac6c radeonsi: move current_rast_prim into si_context
v2: rebase fixes

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:45 +02:00
Nicolai Hähnle 6b416ec3d6 radeonsi: move and rename scissor and viewport state and functions
v2: change GET_MAX_SCISSOR to SI_MAX_SCISSOR

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:45 +02:00
Nicolai Hähnle f86a112b07 radeonsi: move current_rast_prim to r600_common_context
We'll use it in the scissors / clip / guardband state.

v2: avoid a performance regression on r600 when applied to
    (pre-fork) stable branches

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-02 15:07:43 +02:00
Nicolai Hähnle 7dfa891f32 radeonsi/gfx9: fix geometry shaders without output vertices
Not that those are super common or useful, but hey! Fun corner cases
of the API...

Fixes dEQP-GLES31.functional.geometry_shading.emit.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:43:09 +02:00
Marek Olšák 06bfb2d28f r600: fork and import gallium/radeon
This marks the end of code sharing between r600 and radeonsi.
It's getting difficult to work on radeonsi without breaking r600.

A lot of functions had to be renamed to prevent linker conflicts.
There are also minor cleanups.

Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-26 04:21:14 +02:00
Nicolai Hähnle aab134cfa5 radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs
This does not take commutative blending into account yet.

R600_DEBUG=nooutoforder disables it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-18 11:25:19 +02:00
Nicolai Hähnle e4af4433fc radeonsi: hard-code pixel center for interpolateAtSample without multisample buffers
The GLSL rules for interpolateAtSample are unfortunate:

   "Returns the value of the input interpolant variable at
    the location of sample number sample. If
    multisample buffers are not available, the input
    variable will be evaluated at the center of the pixel.
    If sample sample does not exist, the position used to
    interpolate the input variable is undefined."

This fix will fallback to monolithic shader compilation when
interpolateAtSample is used without multisampling.

One alternative would be to always upload 16 sample positions,
filling the buffer up with repetition when the actual number of
samples is less, and then ANDing the sample ID with 0xf. However,
that punishes all well-behaving users of interpolateAtSample,
when in reality, only conformance tests should be affected by
the issue.

Fixes
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:25:45 +02:00
Nicolai Hähnle 92c4277990 radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog
gl_SampleMaskIn is supposed to contain set bits only for the samples that
are covered by the current fragment shader invocation, but the VGPR
initialization hardware loads the set of all bits that are covered at the
current pixel.

Fixes various tests in
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:25:41 +02:00
Nicolai Hähnle 48b3364b5b radeonsi: make si_init_shader_selector_async static
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:24:18 +02:00
Marek Olšák 6eade342eb radeonsi: optimize TCS epilog when invocation 0 writes tess factors
This removes the barrier and LDS stores and loads for tess factors
when it's possible. The removal of the barrier seems more important
to me though.

In one shader, it removes 17 * 4 bytes from the shader binary.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-11 19:02:02 +02:00
Marek Olšák 89bf8668c2 radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HS
-44 bytes in a monolithic LS-HS binary.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07 13:00:07 +02:00
Nicolai Hähnle 45c5c44451 radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug
When the HS wave is empty, the hardware writes the LS VGPRs starting at
v0 instead of v2. Workaround by shifting them back into place when
necessary. For simplicity, this is always done in the LS prolog.

According to the hardware team, this will be fixed in future chips,
so take that into account already.

Note that this is not a bug fix, as the bug was already worked
around by commit 166823bfd2 ("radeonsi/gfx9: add a temporary workaround
for a tessellation driver bug"). This change merely replaces the
workaround by one that should be better.

v2: add workaround code to shader only when necessary
v3: clarify the prefer_mono comment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06 10:02:49 +02:00
Marek Olšák c3ebac6890 radeonsi/gfx9: implement primitive binning
This increases performance, but it was tuned for Raven, not Vega.
We don't know yet how Vega will perform, hopefully not worse.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05 12:09:02 +02:00
Marek Olšák fb7ba68f6c radeonsi: eliminate PS color outputs when colormask kills them
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04 15:10:39 +02:00
Timothy Arceri 0168d1f449 radeonsi: stop leaking nir
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-29 09:46:29 +10:00
Timothy Arceri ea2515d780 glsl: pass shader source keys to the disk cache
We don't actually write them to disk here. That will happen in the
following commit.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-25 13:20:29 +10:00
Marek Olšák 8dadb07790 radeonsi: emit VGT_REUSE_OFF in the right place
clip_regs aren't marked dirty when writes_viewport_index is changed.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-22 13:29:47 +02:00
Marek Olšák 54c2c771bd radeonsi/gfx9: don't use GS scenario A for VS writing ViewportIndex
Vulkan doesn't do it anymore.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-22 13:29:47 +02:00
Marek Olšák a65afda768 radeonsi/gfx9: prevent shader-db crashes
- don't precompile LS and ES (they don't exist on GFX9), compile as VS instead
- don't precompile HS and GS (we don't have LS and ES parts)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-22 13:29:47 +02:00
Nicolai Hähnle 40697e8678 radeonsi: make si_shader_selector_reference globally visible
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:50:55 +02:00
Marek Olšák e887c68bd2 radeonsi: add a separate dirty mask for prefetches
so that we don't rely on si_pm4_state_enabled_and_changed, allowing us
to move prefetches after draw calls.

v2: ckear the dirty mask after unbinding shaders

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2017-08-07 21:12:24 +02:00
Marek Olšák a7b0014d1a radeonsi: add and use si_pm4_state_enabled_and_changed
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák 58d062b87d radeonsi: de-atomize L2 prefetch
I'd like to be able to move the prefetch call site around.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Nicolai Hähnle 25ff22e390 radeonsi: tweak next-shader assumptions when streamout is used
VS with streamout is always a HW VS.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:43 +02:00