KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Marek Olšák	99e0ba6868	radeonsi: record CLIPVERTEX output usage properly for compatibility profiles This was missed when adding CLIPVERTEX support into GS & tess. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-06-13 22:00:20 -04:00
Marek Olšák	2f65c67043	radeonsi: fix passing gl_ClipVertex for GS and tess Also add the fprintf call. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-05-25 16:46:00 -04:00
Marek Olšák	a7d61c0753	radeonsi: fix color inputs/outputs for GS and tess GS is tested, tessellation is untested. Have outputs_written_before_ps for HW VS and outputs_written for other stages. The reason is that COLOR and BCOLOR alias for HW VS, which drives elimination of VS outputs based on PS inputs. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-05-25 16:46:00 -04:00
Marek Olšák	92ea9329e5	radeonsi: fix incorrect parentheses around VS-PS varying elimination I don't know if it caused issues. Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-05-25 16:46:00 -04:00
Marek Olšák	07e02c8617	radeonsi: round ps_iter_samples in set_min_samples Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-05-24 13:41:57 -04:00
Marek Olšák	87eb597758	radeonsi: add struct si_compiler containing LLVMTargetMachineRef It will contain more variables. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	6fadfc01c6	radeonsi: use r600_resource() typecast helper Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	3160ee876a	radeonsi: remove unused atom parameter from si_atom::emit Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	e395475096	radeonsi: remove function si_init_atom Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	639b673fc3	radeonsi: don't use an indirect table for state atoms Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	9054799b39	radeonsi: rename r600_atom -> si_atom Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	60299e9abe	radeonsi: don't emit partial flushes for internal CS flushes only Tested-by: Benedikt Schemmer <ben@besd.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-16 16:58:10 -04:00
Marek Olšák	6a93441295	radeonsi: remove r600_common_context Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	5777488406	radeonsi: move r600_cs.h contents into si_pipe.h, si_build_pm4.h Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	72e9e98076	radeonsi: move and rename R600_ERR out of r600_pipe_common.h Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	5f1cddde78	radeonsi: move definitions out of r600_pipe_common.h Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	c424f86180	radeonsi: use si_context instead of pipe_context in parameters pt1 Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	4c5efc40f4	radeonsi: update copyrights Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	95bc30275b	radeonsi: switch radeon_add_to_buffer_list parameter to si_context Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	2b70dd8c8a	radeonsi: flatten / remove struct r600_ring Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	17e8f1608e	radeonsi: call CS flush functions directly whenever possible Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	0669dca9c0	radeonsi: skip DCC render feedback checking if color writes are disabled	2018-04-05 15:34:58 -04:00
Marek Olšák	2be6143032	radeonsi: implement GL_KHR_blend_equation_advanced MSAA is supported using sample shading. Layered rendering and all texture targets are also supported. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-04-02 13:55:25 -04:00
Marek Olšák	9b7db12815	radeonsi: remove chip_class parameter from si_lower_nir We can get it from si_screen. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>	2018-03-08 14:58:16 -05:00
Marek Olšák	2e30268877	radeonsi: mask out high VM address bits in registers where needed	2018-03-07 13:55:35 -05:00
Timothy Arceri	70190a6567	radeonsi/nir: call ac_lower_indirect_derefs() Fixes piglit tests: tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec3-index-rd.shader_test tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-05 14:09:23 +11:00
Timothy Arceri	561503e3bd	radeonsi: add chip class to compiler_ctx_state This will be used in the following patch. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-05 14:09:23 +11:00
Marek Olšák	8799eaed99	radeonsi: remove 2 unused user SGPRs from merged TES-GS with 32-bit pointers The effect of the last 13 commits on user SGPR counts: Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-26 12:01:19 +01:00
Marek Olšák	3fa7a59d69	radeonsi: make SI_SGPR_VERTEX_BUFFERS the last user SGPR input so that it can be removed and replaced with inline VBO descriptors, and the pointer can be packed in unused bits of VBO descriptors. This also removes the pointer from merged TES-GS where it's useless. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-26 12:01:08 +01:00
Marek Olšák	2d03c4cac8	radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRs TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address aligned to 512KB. Hey, it's a 13-bit pointer! Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-24 23:08:29 +01:00
Marek Olšák	fca7dee9c6	radeonsi: put both tessellation rings into 1 buffer Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-24 23:08:28 +01:00
Marek Olšák	d2963d8b5f	radeonsi: move tessellation ring info into si_screen Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-24 23:08:28 +01:00
Timothy Arceri	691c320de0	radeonsi: add nir shader cache support In future we might want to try avoid calling nir_serialize() but this works for now. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-20 13:15:02 +11:00
Timothy Arceri	2b431808ab	radeonsi: rename variables tgsi_binary -> ir_binary This better represents that the ir could be either tgsi or nir. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-20 13:15:02 +11:00
Marek Olšák	fdf01d0244	radeonsi: remove DBG_PRECOMPILE it's useless and shader-db stats only report the main shader part. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-01-31 03:21:20 +01:00
Marek Olšák	148b48646b	radeonsi: print shader-db stats for main parts, not final binaries This is needed to get shader-db stats for LS,HS,ES,GS stages on gfx9. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-01-31 03:21:20 +01:00
Timothy Arceri	452586b56a	radeonsi: add dummy implementation of si_nir_scan_tess_ctrl() Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-01-05 11:58:55 +11:00
Józef Kucia	f222cf3c6d	radeonsi: fix alpha-to-coverage if color writes are disabled If alpha-to-coverage is enabled, we have to compute alpha even if color writes are disabled. Signed-off-by: Józef Kucia <joseph.kucia@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-01-04 01:58:33 +01:00
Samuel Pitoiset	79b34d0832	amd/common: add ac_vgt_gs_mode() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-18 11:50:50 +01:00
Samuel Pitoiset	55f8431c76	amd/common: add ac_get_cb_shader_mask() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-18 11:50:48 +01:00
Samuel Pitoiset	45872a0a6d	radeonsi: make use of ac_get_spi_shader_z_format() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-14 22:23:25 +01:00
Marek Olšák	2c5f2936af	r300,r600,radeonsi: replace RADEON_FLUSH_* with PIPE_FLUSH_* and handle PIPE_FLUSH_HINT_FINISH in r300. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-29 18:21:30 +01:00
Marek Olšák	950221f923	radeonsi: remove r600_common_screen Most files in gallium/radeon now include si_pipe.h. chip_class and family are now here: sscreen->info.family sscreen->info.chip_class Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-29 18:21:30 +01:00
Marek Olšák	2208b760f3	radeonsi: move shader debug helpers out of r600_pipe_common.c Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-29 18:21:30 +01:00
Marek Olšák	c63e225bff	radeonsi: remove some definitions and helpers from r600_pipe_common.h Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-29 18:21:30 +01:00
Nicolai Hähnle	f76a6cb337	radeonsi: always use async compiles when creating shader/compute states With Gallium threaded contexts, creating shader/compute states is effectively a screen operation, so we should not use context state. In particular, this allows us to avoid using the context's LLVM TargetMachine. This isn't an issue yet because u_threaded_context filters out non-async debug callbacks, and we disable threaded contexts for debug contexts. However, we may want to change that in the future. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-09 11:53:20 +01:00
Nicolai Hähnle	dd7c273e87	radeonsi: move pipe debug callback to si_context Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-09 11:53:19 +01:00
Nicolai Hähnle	0f54ee6072	radeonsi: reduce the scope of sel->mutex in si_shader_select_with_key We only need the lock to guard changes in the variant linked list. The actual compilation can happen outside the lock, since we use the ready fence as a guard. v2: fix double-unlock Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-09 11:37:51 +01:00
Nicolai Hähnle	4f493c79ee	radeonsi: use ready fences on all shaders, not just optimized ones There's a race condition between si_shader_select_with_key and si_bind_XX_shader: Thread 1 Thread 2 -------- -------- si_shader_select_with_key begin compiling the first variant (guarded by sel->mutex) si_bind_XX_shader select first_variant by default as state->current si_shader_select_with_key match state->current and early-out Since thread 2 never takes sel->mutex, it may go on rendering without a PM4 for that shader, for example. The solution taken by this patch is to broaden the scope of shader->optimized_ready to a fence shader->ready that applies to all shaders. This does not hurt the fast path (if anything it makes it faster, because we don't explicitly check is_optimized). It will also allow reducing the scope of sel->mutex locks, but this is deferred to a later commit for better bisectability. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-09 11:37:51 +01:00
Marek Olšák	529cdce799	radeonsi: remove 'Authors:' comments It's inaccurate. Instead, see the copyright and use "git log" and "git blame" to know the authorship. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-02 18:19:03 +01:00
Marek Olšák	da0083f123	radeonsi: use postponed KILL only when derivatives are used Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-24 14:56:34 +02:00
Marek Olšák	65f2e33500	radeonsi: import r600_streamout from drivers/radeon Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-09 16:26:55 +02:00
Marek Olšák	3784ce9782	radeonsi: enumerize DBG flags Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-09 16:20:16 +02:00
Marek Olšák	5a47abb63e	radeonsi: don't change viewport for blits, use window-space positions The viewport state was an identity anyway. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Marek Olšák	13b6c1c031	radeonsi: minor cleanup of si_update_vs_writes_viewport_index Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Marek Olšák	69ccb9dae7	radeonsi: use new VS blit shaders (VS inputs in SGPRs) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Marek Olšák	6a8401a94e	radeonsi: add VS blit shader creation no users yet Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Nicolai Hähnle	12f3155e28	radeonsi: simplify the signature of si_update_vs_writes_viewport_index Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-10-02 15:07:45 +02:00
Nicolai Hähnle	7bbcb6ac6c	radeonsi: move current_rast_prim into si_context v2: rebase fixes Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-10-02 15:07:45 +02:00
Nicolai Hähnle	6b416ec3d6	radeonsi: move and rename scissor and viewport state and functions v2: change GET_MAX_SCISSOR to SI_MAX_SCISSOR Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-10-02 15:07:45 +02:00
Nicolai Hähnle	f86a112b07	radeonsi: move current_rast_prim to r600_common_context We'll use it in the scissors / clip / guardband state. v2: avoid a performance regression on r600 when applied to (pre-fork) stable branches Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-10-02 15:07:43 +02:00
Nicolai Hähnle	7dfa891f32	radeonsi/gfx9: fix geometry shaders without output vertices Not that those are super common or useful, but hey! Fun corner cases of the API... Fixes dEQP-GLES31.functional.geometry_shading.emit.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2017-09-29 11:43:09 +02:00
Marek Olšák	06bfb2d28f	r600: fork and import gallium/radeon This marks the end of code sharing between r600 and radeonsi. It's getting difficult to work on radeonsi without breaking r600. A lot of functions had to be renamed to prevent linker conflicts. There are also minor cleanups. Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-26 04:21:14 +02:00
Nicolai Hähnle	aab134cfa5	radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs This does not take commutative blending into account yet. R600_DEBUG=nooutoforder disables it. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2017-09-18 11:25:19 +02:00
Nicolai Hähnle	e4af4433fc	radeonsi: hard-code pixel center for interpolateAtSample without multisample buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:25:45 +02:00
Nicolai HÃÂ¤hnle	92c4277990	radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:25:41 +02:00
Nicolai Hähnle	48b3364b5b	radeonsi: make si_init_shader_selector_async static Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:24:18 +02:00
Marek Olšák	6eade342eb	radeonsi: optimize TCS epilog when invocation 0 writes tess factors This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 19:02:02 +02:00
Marek Olšák	89bf8668c2	radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HS -44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:07 +02:00
Nicolai Hähnle	45c5c44451	radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit `166823bfd2` ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-06 10:02:49 +02:00
Marek Olšák	c3ebac6890	radeonsi/gfx9: implement primitive binning This increases performance, but it was tuned for Raven, not Vega. We don't know yet how Vega will perform, hopefully not worse. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-05 12:09:02 +02:00
Marek Olšák	fb7ba68f6c	radeonsi: eliminate PS color outputs when colormask kills them Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-04 15:10:39 +02:00
Timothy Arceri	0168d1f449	radeonsi: stop leaking nir Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-29 09:46:29 +10:00
Timothy Arceri	ea2515d780	glsl: pass shader source keys to the disk cache We don't actually write them to disk here. That will happen in the following commit. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-25 13:20:29 +10:00
Marek Olšák	8dadb07790	radeonsi: emit VGT_REUSE_OFF in the right place clip_regs aren't marked dirty when writes_viewport_index is changed. Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Marek Olšák	54c2c771bd	radeonsi/gfx9: don't use GS scenario A for VS writing ViewportIndex Vulkan doesn't do it anymore. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Marek Olšák	a65afda768	radeonsi/gfx9: prevent shader-db crashes - don't precompile LS and ES (they don't exist on GFX9), compile as VS instead - don't precompile HS and GS (we don't have LS and ES parts) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Nicolai Hähnle	40697e8678	radeonsi: make si_shader_selector_reference globally visible Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:50:55 +02:00
Marek Olšák	e887c68bd2	radeonsi: add a separate dirty mask for prefetches so that we don't rely on si_pm4_state_enabled_and_changed, allowing us to move prefetches after draw calls. v2: ckear the dirty mask after unbinding shaders Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2017-08-07 21:12:24 +02:00
Marek Olšák	a7b0014d1a	radeonsi: add and use si_pm4_state_enabled_and_changed Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	58d062b87d	radeonsi: de-atomize L2 prefetch I'd like to be able to move the prefetch call site around. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Nicolai Hähnle	25ff22e390	radeonsi: tweak next-shader assumptions when streamout is used VS with streamout is always a HW VS. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:43 +02:00
Nicolai Hähnle	b49c2c9fa3	radeonsi/nir: perform lowering of input/output driver locations Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:40 +02:00
Nicolai Hähnle	c5f70a5174	radeonsi: bypass the shader cache for NIR shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:33 +02:00
Nicolai Hähnle	29d7bdd179	radeonsi: scan NIR shaders to obtain required info v2: set num_instruction to 2, i.e. 1 + END (Marek) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:32 +02:00
Marek Olšák	ffa7ec9e22	radeonsi: simplify computation of tessellation offchip buffers This is overly cautious, but better safe than sorry. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:55:07 -04:00
Marek Olšák	aaee0d1bbf	gallium: use "ull" number suffix to keep the QtCreator parser happy It can't parse "llu". Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-07-10 22:44:48 +02:00
Marek Olšák	4a10d6154e	radeonsi: move instance divisors into a constant buffer Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 19:55:09 +02:00
Marek Olšák	77d2a98353	Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs" This reverts commit `7b2240ac9c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Marek Olšák	dbe45e1180	Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy" This reverts commit `6b6fed3a3c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Nicolai Hähnle	da2e52b382	radeonsi: use the correct LLVMTargetMachineRef in si_build_shader_variant si_build_shader_variant can actually be called directly from one of normal-priority compiler threads. In that case, the thread_index is only valid for the normal tm array. v2: - use the correct sel/shader->compiler_ctx_state Fixes: `86cc809726` ("radeonsi: use a compiler queue with a low priority for optimized shaders") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-22 09:45:23 +02:00
Samuel Pitoiset	2c3a7d5840	radeonsi: track use of bindless samplers/images from tgsi_shader_info This adds some new helper functions to know if the current draw call (or dispatch compute) is using bindless samplers/images, based on TGSI analysis. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Marek Olšák	e80a056ff9	radeonsi: replace si_vertex_elements::elements with separate fields It makes si_vertex_elements a little smaller. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	6b6fed3a3c	radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy The previous patch helps with this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	7b2240ac9c	radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs the next patch will benefit from this Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	1621b33d73	radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	4b8d0c2b1d	radeonsi: don't update dependent states if it has no effect (v2) This and the previous clip_regs commit decrease IB sizes and the number of si_update_shaders invocations as follows: IB size si_update_shaders calls Borderlands 2 -10% -27% Deus Ex: MD -5% -11% Talos Principle -8% -30% v2: always dirty cb_render_state in set_framebuffer_state Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	bacaceb78a	radeonsi: update clip_regs on shader state changes only when it's needed Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:20 +02:00
Marek Olšák	2b7fd9df9a	radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:18 +02:00
Marek Olšák	140b3c5019	radeonsi: add a new helper si_get_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:16 +02:00
Marek Olšák	e9409c86e7	radeonsi: remove 8 bytes from si_shader_key We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:06 +02:00
Marek Olšák	2b8b9a56ef	radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:14:15 +02:00
Marek Olšák	3effce4fb0	radeonsi/gfx9: prevent a race when the previous shader's main part is missing Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	b5bc826ead	radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	ffbaba6072	radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9 LS is merged into TCS. If there is no TCS, LS is merged into fixed-func TCS. The problem is the fixed-func TCS was ignored by scratch update functions, so LS didn't have the scratch buffer set up. Note that Mesa 17.1 doesn't have merged shaders. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	6e2c07749b	radeonsi: move streamout state update out of si_update_shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	8147c4a4a5	radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_key Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	86cc809726	radeonsi: use a compiler queue with a low priority for optimized shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	38bd468a78	radeonsi: drop unfinished shader compilations when destroying shaders If we enqueue too many jobs and destroy the GL context, it may take several seconds before the jobs finish. Just drop them instead. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	a7f098fb76	radeonsi: only upload (dump to L2) those descriptors that are used by shaders This decreases the size of CE RAM dumps to L2, or the size of descriptor uploads without CE. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	53c2ef36da	radeonsi: record which descriptor slots are used by shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Nicolai Hähnle	4ea67c1751	radeonsi: rename tcs_tes_uses_prim_id for clarity What we care about is whether PrimID is used while tessellation is enabled; whether it's used in TCS/TES or further down the pipeline is irrelevant. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-16 16:11:54 +02:00
Nicolai Hähnle	f4dbe2efb7	radeonsi: fix gl_PrimitiveIDIn in geometry shader when using tessellation This builds on commit `0549ea15ec` ("radeonsi: fix primitive ID in fragment shader when using tessellation"). Fixes piglit arb_tessellation_shader/execution/gs-primitiveid-instanced.shader_test Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-16 16:11:53 +02:00
Nicolai Hähnle	a16ae77185	radeonsi: get rid of secondary input/output word By keeping track of fewer generics, everything can fit into 64 bits. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-12 10:46:06 +02:00
Nicolai Hähnle	cfe6e30f1b	radeonsi: skip generic out/in indices without a shader IO index OpenGL uses at most 32 generic outputs/inputs in any stage, and they always have a shader IO index and therefore fit into the outputs_written/ inputs_read/kill_outputs fields. However, Nine uses semantic indices more liberally. We support that in VS-PS pipelines, except that the optimization of killing outputs must be skipped. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-12 10:46:05 +02:00
Nicolai Hähnle	7091fe887b	radeonsi: use SI_MAX_IO_GENERIC instead of magic values Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-12 10:46:04 +02:00
Nicolai Hähnle	cb2ac69628	radeonsi: split per-patch from per-vertex indices Make it a bit clearer that the index spaces are logically seperate by having them defined in different functions. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-08 17:42:17 +02:00
Nicolai Hähnle	b84b631c63	radeonsi: load patch_id for TES-as-ES when exporting for PS For some reason, this change is only necessary on SI. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-08 17:42:17 +02:00
Nicolai Hähnle	0549ea15ec	radeonsi: fix primitive ID in fragment shader when using tessellation In a VS->TCS->TES->PS pipeline, the primitive ID is read from TES exports, so it is as if TES were using the primitive ID. Specifically, this fixes a bug where the primitive ID is not reset at the start of a new instance. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-08 17:42:17 +02:00
Marek Olšák	194d9b27cc	radeonsi/gfx9: allow the scratch buffer in HS and GS It works now. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	8ac4923a67	radeonsi: prevent race conditions when doing scratch patching Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	9dfc030b48	radeonsi: separate scratch state patching code into its own function Picked from a different branch. When we stop using the scratch patching, this function will not be called. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	1b01014cbf	radeonsi/gfx9: also apply scratch relocations to the 1st shader of merged shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	a47289f8fc	radeonsi: remove unused parameters from si_shader_apply_scratch_relocs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	f466683cb0	radeonsi/gfx9: fix gl_ViewportIndex v2: remove unnecessary LLVMBuildAnd calls Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-03 22:58:27 +02:00
Marek Olšák	4e50062028	radeonsi: pass tessellation ring addresses via user SGPRs This removes s_load_dword latency for tess rings. We need just 1 SGPR for the address if we use 64K alignment. The final asm for recreating the descriptor is: // s2 is (address >> 16) s_mov_b32 s3, 0 s_lshl_b64 s[4:5], s[2:3], 16 s_mov_b32 s6, -1 s_mov_b32 s7, 0x27fac v2: bitcast the descriptor type from v2i64 to v4i32 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	9fd9a7d0ba	radeonsi: remove VS epilog code, compile VS with PrimID export on demand The use of PrimID in the pixel shader is too rare to deserve such a sizable support code. The initial idea of the VS epilog was to move the clipping code there and remove it based on states, but optimized variants are now used to do that and are easier to support, so the VS epilog has turned out to be not so useful. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	3b2e93e472	radeonsi: get InstanceID from VGPR1 (or VGPR2 for tess) instead of VGPR3 VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	678d568c7b	radeonsi: don't load PrimID in TES if it's not used Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	ed9a51cd3b	radeonsi/gfx9: 2nd shader of merged shaders should hold a reference of the 1st Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	ef40937854	radeonsi: add reference counting for shader selectors The 2nd shader of merged shaders should take a reference of the 1st shader. The next commit will do that. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	6c15e15af4	radeonsi/gfx9: set VGT_VERTEX_REUSE for ES in ES-GS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	887ef1de34	radeonsi/gfx9: set TES registers for merged ES-GS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	49cd0cbfd5	radeonsi/gfx9: disallow scratch buffer for LS-HS and ES-GS not implemented yet Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	2857b14bba	radeonsi/gfx9: always compile monolithic ES-GS (asynchronously) In addition to the non-monolithic variant. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	6a9c20fdd5	radeonsi/gfx9: make sure the 1st shader's main part exists for merged shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	8b220877ad	radeonsi/gfx9: set registers and shader key for merged ES-GS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	ab197ad8d1	radeonsi/gfx9: add GS user SGPRs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	067dacd1b1	radeonsi/gfx9: define and set LS-HS user SGPRs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	0588146cb0	radeonsi/gfx9: set up shader registers for merged LS-HS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	62abdb17bb	radeonsi/gfx9: add initial code generation for non-monolithic merged LS-HS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	b1ed3ffc56	radeonsi: separate out VS prolog key generation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Dave Airlie	e2659176ce	radeonsi/ac: move vertex export remove to common code. This code can be shared by radv, we bump the max to VARYING_SLOT_MAX here, but that shouldn't have too much fallout. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-27 05:17:47 +01:00
Marek Olšák	96b0cfc82e	radeonsi: turn si_shader_key::mono into a non-union A merged LS-HS shader needs both fix_fetch and inputs_to_copy for compilation. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-26 13:08:05 +02:00
Marek Olšák	3f2a0649ab	radeonsi: adjust ESGS ring buffer size computation on VI Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-26 13:08:05 +02:00
Marek Olšák	60a20e6879	radeonsi/gfx9: set MAX_PRIMGRP_IN_WAVE in the correct register Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-26 13:08:05 +02:00
Marek Olšák	bd2cde0c25	radeonsi: add si_shader_selector::vs_needs_prolog cleanup Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-17 01:22:11 +02:00
Marek Olšák	777f305840	radeonsi: don't set VGT_GS_MODE as part of the GS state The VS state sets it. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-17 01:22:11 +02:00
Nicolai Hähnle	d6588d9962	radeonsi: cope with missing disassembly For robustness and testing purposes. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-14 22:51:07 +02:00
Nicolai Hähnle	472c84d1ad	radeonsi: provide VS_STATE input to all VS variants v2: fix incorrect change in get_tcs_out_patch_stride Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:30:20 +02:00
Marek Olšák	283c31afa1	radeonsi: unify HS max_offchip_buffers workarounds Vulkan doesn't set more than 508. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-03-31 21:41:57 +02:00
Marek Olšák	172b05a37e	radeonsi/gfx9: don't generate LS and ES states these shaders don't exist on GFX9 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-03-30 14:44:33 +02:00
Marek Olšák	5271d12a6e	radeonsi/gfx9: trivial shader and ring changes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-03-30 14:44:33 +02:00
Marek Olšák	6d21fd51b6	radeonsi/gfx9: disable RB+ on Vega10 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-03-30 14:44:33 +02:00
Marek Olšák	c9b004af58	radeonsi/gfx9: handle GFX9 in a few places Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-03-30 14:44:33 +02:00
Marek Olšák	518d834162	radeonsi: don't hang on shader compile failure Cc: 17.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-03-24 18:25:05 +01:00
Grazvydas Ignotas	529a767041	util/disk_cache: use a helper to compute cache keys This will allow to hash additional data into the cache keys or even change the hashing algorithm easily, should we decide to do so. v2: don't try to compute key (and crash) if cache is disabled Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-03-21 11:15:52 +11:00
Marek Olšák	e9c6953ddb	radeonsi: require that compiler threads are enabled threaded gallium can't use pipe_context's LLVM target machine, because create_shader_selector can be called from a non-driver thread. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-03-17 18:30:21 +01:00
Timothy Arceri	628e84a58f	gallium/util: replace pipe_mutex_unlock() with mtx_unlock() pipe_mutex_unlock() was made unnecessary with `fd33a6bcd7`. Replaced using: find ./src -type f -exec sed -i -- \ 's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \; Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-07 08:53:05 +11:00
Timothy Arceri	ba72554f3e	gallium/util: replace pipe_mutex_lock() with mtx_lock() replace pipe_mutex_lock() was made unnecessary with `fd33a6bcd7`. Replaced using: find ./src -type f -exec sed -i -- \ 's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \; Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-07 08:52:38 +11:00
Timothy Arceri	be188289e1	gallium/util: replace pipe_mutex_destroy() with mtx_destroy() pipe_mutex_destroy() was made unnecessary with `fd33a6bcd7`. Replace was done with: find ./src -type f -exec sed -i -- \ 's:pipe_mutex_destroy(\([^)]*\)):mtx_destroy(\&\1):g' {} \; Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-07 08:52:16 +11:00
Timothy Arceri	75b47dda0c	gallium/util: replace pipe_mutex_init() with mtx_init() pipe_mutex_init() was made unnecessary with `fd33a6bcd7`. Replace was done using: find ./src -type f -exec sed -i -- \ 's:pipe_mutex_init(\([^)]*\)):(void) mtx_init(\&\1, mtx_plain):g' {} \; Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-07 08:52:07 +11:00
Timothy Arceri	6084855528	radeonsi: add support for an on-disk shader cache V2: - when loading from disk cache also binary insert into memory cache. - check that the binary loaded from disk is the correct size. If not delete the cache item and skip loading from cache. V3: - remove unrequired variable Reviewed-by: Grigori Goronzy <greg@chown.ath.cx> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-03 12:09:08 +11:00
Marek Olšák	35915af6c9	radeonsi: fix broken tessellation on Carrizo and Stoney Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99850 Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2017-02-25 00:03:09 +01:00
Marek Olšák	24847dd1b5	gallium/u_queue: isolate util_queue_fence implementation it's cleaner this way. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-22 20:26:39 +01:00
Marek Olšák	63c462226e	radeonsi: fix issues with monolithic shaders R600_DEBUG=mono has had no effect since: commit `1fabb29717` Author: Marek Olšák <marek.olsak@amd.com> Date: Tue Feb 14 22:08:32 2017 +0100 radeonsi: have separate LS and ES main shader parts in the shader selector Also, this assertion was failing: si_state_shaders.c:1307: si_shader_select_with_key: Assertion `!shader->is_optimized' failed. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-21 21:27:23 +01:00
Marek Olšák	84e72f2962	radeonsi: skip TESSINNER/OUTER offchip stores if TES doesn't read them We were unconditionally storing these outputs, sometimes even one component at a time, but apps never read them in TES. Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can easily disable them on demand. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-21 21:27:23 +01:00
Nicolai Hähnle	066a117be7	radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIK The same PS epilog workaround as for 8-bit integer formats is required, since the CB doesn't do clamping. Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-02-21 10:45:13 +01:00
Marek Olšák	45240ce598	radeonsi: use R600_RESOURCE_FLAG_UNMAPPABLE where it's desirable Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	1fabb29717	radeonsi: have separate LS and ES main shader parts in the shader selector This might reduce the on-demand compilation if the initial VS/LS/ES determination is wrong. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	a02117ba6e	radeonsi: don't compile pure monolithic shaders asynchronously there is no point, we have to wait anyway. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	41a2157a68	radeonsi: make fix_fetch an array of uint8_t so that we can add 3-component fallbacks. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	408f9a1584	radeonsi: atomize the scratch buffer state The update frequency is very low. Difference: Only account for the size when allocating a new one and when starting a new IB, and check for NULL. (v3) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 17:29:36 +01:00
Marek Olšák	5f99c49008	radeonsi: precompute IA_MULTI_VGT_PARAM values into a table The perf difference is very small: 0.99% -> 0.40% for the time spent in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	c78177fc64	radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	802fcdc0d2	radeonsi: atomize L2 prefetches to move the big conditional statement out of draw_vbo Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	c99ba3eb47	radeonsi: unbind disabled shader stages to prevent useless L2 prefetches Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	35cd7551a4	radeonsi: use the correct target machine when building shader variants If the shader selector is created with a different context than the shader variant, we should use the calling context's target machine for the shader variant. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99419 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-18 19:51:31 +01:00
Marek Olšák	3ae3be6dd4	radeonsi: move shader pipe context state into a separate structure Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-18 19:51:31 +01:00
Nicolai Hähnle	5e94e5bb9b	radeonsi: fix R600_DEBUG=nooptvariant Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net>	2017-01-16 20:16:18 +01:00
Marek Olšák	44e9b67229	radeonsi: make fix_fetch 64-bit v2: add u_bit_consecutive64 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-16 18:07:08 +01:00
Marek Olšák	6f356d15be	radeonsi: cleanly communicate whether si_shader_dump should check R600_DEBUG Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-09 12:01:30 +01:00
Marek Olšák	4b93ba542c	radeonsi: assume that a TES without POSITION precedes GS Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-06 21:05:48 +01:00
Marek Olšák	3753dc896d	radeonsi: update clip_regs if clip_disable changes to fix a hang This seems to fix the GPU hangs caused by: commit `ed3190b3f3` Author: Marek Olšák <marek.olsak@amd.com> Date: Sun Nov 13 18:41:43 2016 +0100 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219 Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-01-05 14:01:18 +01:00
Nicolai Hähnle	ec0a0a60cc	radeonsi: shrink the GSVS ring to account for the reduced item sizes Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-12-12 09:05:17 +01:00
Nicolai Hähnle	6fdef7d265	radeonsi: shrink each vertex stream to the actually required size Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-12-12 09:05:13 +01:00
Nicolai Hähnle	2f2e941e2d	radeonsi: use a single descriptor for the GSVS ring We can hardcode all of the fields for swizzling in the geometry shader. The advantage is that we use fewer descriptor slots and we no longer have to update any of the (ring) descriptors when the geometry shader changes. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-12-12 09:05:05 +01:00
Nicolai Hähnle	7b5b3d63c5	radeonsi: update all GSVS ring descriptors for new buffer allocations Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-12-12 09:04:06 +01:00
Marek Olšák	e5302ad936	radeonsi: add a debug flag that disables optimized shader variants Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-23 18:49:10 +01:00
Marek Olšák	86514d84e0	util: import CRC32 implementation from gallium Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2016-11-22 18:05:51 +01:00
Marek Olšák	bf75ef3f92	radeonsi: remove all varyings for depth-only rendering or rasterization off Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	ef6c84b301	radeonsi: eliminate VS outputs that aren't used by PS at runtime A past commit added the ability to compile "optimized" shader variants asynchronously (not stalling the app). This commit builds upon that and adds what is basically a runtime shader linker. If a VS output isn't used by the currently-bound PS, a new VS compilation is started without that output. The new shader variant is used when it's ready. All apps using separate shader objects I've seen had unused VS outputs. Eliminating unused/useless VS outputs also eliminates the corresponding vertex attribute loads. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	7e76f9a7a8	radeonsi: record information about all written and read varyings It's just tgsi_shader_info with DEFAULT_VAL varyings removed. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	c7f3e5c647	radeonsi: make si_shader_io_get_unique_index stricter Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	ed3190b3f3	radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled This is the first user of optimized monolithic shader variants. Cull distances can't be disabled by states. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	d984a324bf	radeonsi: add infrastr. for compiling optimized shader variants asynchronously Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	d2a56985d7	radeonsi: don't set vs.epilog.export_prim_id if TES is bound there is no VS epilog in this case Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	fee71fec25	radeonsi: simplify checking for monolithic compilation Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	6d5c2a8b5c	radeonsi: split the shader key into 3 logical parts key->part.: prolog and epilog flags only key->as_{ls,es}: special flags key->mono.: flags for monolithic compilation only Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	e59389d738	radeonsi: assume that a VS without POSITION is LS Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Nicolai Hähnle	2c875158e2	radeonsi: fix vertex fetches for 2_10_10_10 formats The hardware always treats the alpha channel as unsigned, so add a shader workaround. This is rare enough that we'll just build a monolithic vertex shader. The SINT case cannot actually happen in OpenGL, but I've included it for completeness since it's just a mix of the other cases. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-04 21:30:18 +01:00
Nicolai Hähnle	908f92ad1f	radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and others. This leaves the case of triangle strips with adjacency and primitive restarts open. It seems that the only thing that cares about that is a piglit test. Fixing this efficiently would be really involved, and I don't want to use the hammer of degrading to software handling of indices because there may well be software that uses this draw mode (without caring about the precise rotation of triangles). v2: - skip the GS prolog entirely if workaround is not needed - only check for TES (TES is always non-null when tessellation is used) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:11:24 +01:00
Nicolai Hähnle	3b2516721b	radeonsi: make the GS copy shader owned by the GS selector The copy shader only depends on the selector. This change avoids creating separate code paths for monolithic vs. non-monolithic geometry shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:50 +01:00
Nicolai Hähnle	9c6f7d66dc	radeonsi: si_shader_vs only depends on the GS selector Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:48 +01:00
Nicolai Hähnle	693435d846	radeonsi: si_vgt_gs_mode only depends on the selector Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:45 +01:00
Marek Olšák	d268b7f95e	radeonsi: add a driver query for shader cache hits This is an 8-month old patch. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	ad45dce4a2	radeonsi: remove si_resource_create_custom Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	29144d0f34	gallium/radeon: stop using PIPE_BIND_CUSTOM it has no effect whatsoever Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	3ec9975555	radeonsi: eliminate trivial constant VS outputs These constant value VS PARAM exports: - 0,0,0,0 - 0,0,0,1 - 1,1,1,0 - 1,1,1,1 can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports can be removed from the IR to save export & parameter memory. After LLVM optimizations, analyze the IR to see which exports are equal to the ones listed above (or undef) and remove them if they are. Targeted use cases: - All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them are unused by PS (such as Witcher 2 below). - VS output arrays with unused elements that the GLSL compiler can't eliminate (such as Batman below). The shader-db deltas are quite interesting: (not from upstream si-report.py, it won't be upstreamed) PERCENTAGE DELTAS Shaders PARAM exports (affected only) batman_arkham_origins 589 -67.17 % bioshock-infinite 1769 -0.47 % dirt-showdown 548 -2.68 % dota2 1747 -3.36 % f1-2015 776 -4.94 % left_4_dead_2 1762 -0.07 % metro_2033_redux 2670 -0.43 % portal 474 -0.22 % talos_principle 324 -3.63 % warsow 176 -2.20 % witcher2 1040 -73.78 % ---------------------------------------- All affected 991 -65.37 % ... 9681 -> 3353 ---------------------------------------- Total 26725 -10.82 % ... 58490 -> 52162 v2: treat Undef as both 0 and 1 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)	2016-10-19 22:21:46 +02:00
Marek Olšák	a2ea653a49	radeonsi: remove cb0_is_integer handling st/mesa does this for us. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-19 19:26:30 +02:00
Marek Olšák	7dddf0b7ab	radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settings The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 19:00:51 +02:00
Marek Olšák	e12c1cab5d	radeonsi: disable ReZ This is a serious performance fix. Discovered by luck. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354 Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 19:00:51 +02:00
Marek Olšák	e4bbab9022	radeonsi: fix R600_DEBUG=precompile for shader-db radeonsi no longer supports pixel shaders without interpolation optimizations, which led to assertion failures in si_shader_ps when running shader-db. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 18:29:40 +02:00
Marek Olšák	300a8221e9	radeonsi: add assertions to validate interpolation flags Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-05 21:03:23 +02:00
Marek Olšák	8c6ea5a6ff	radeonsi: remove unnecessary #includes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-04 16:12:07 +02:00
Marek Olšák	53d2c8f00f	radeonsi: don't re-create shader PM4 states after scratch buffer update Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-04 16:12:05 +02:00
Marek Olšák	275c073c6a	radeonsi: export SampleMask from pixel shaders at full rate Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-09-13 20:38:25 +02:00
Marek Olšák	6c8b76263d	radeonsi: also do VS_PARTIAL_FLUSH before updating VGT ring pointers ported from Vulkan Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-09-05 18:01:15 +02:00
Marek Olšák	c3f716fe67	gallium/radeon: merge USER_SHADER and INTERNAL_SHADER priority flags there's no reason to separate these Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2016-08-26 15:50:10 +02:00
Marek Olšák	e722b90bc9	radeonsi: eliminate PS OUT[1] if dual src blending is off and CB1 is not bound All VP DX9 ports benefit from this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-08-17 12:24:35 +02:00
Marek Olšák	c15a9dec29	radeonsi: skip unnecessary si_update_shaders calls Small decrease in draw call overhead. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-08-03 17:46:46 +02:00
Marek Olšák	1e5f00f9d5	radeonsi: pre-generate shader logs for ddebug This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-26 23:06:46 +02:00
Nicolai Hähnle	3d69357da9	radeonsi: ensure sample locations are set for line and polygon smoothing Since commit `d938b8c`, the sample locations are no longer set unconditionally, so we need to set the atom to dirty on all chips, not just Polaris. Cc: 12.0 <mesa-stable@lists.freedesktop.org>	2016-07-23 15:36:39 +02:00
Rob Clark	44bbfedbd9	gallium/u_queue: add optional cleanup callback Adds a second optional cleanup callback, called after the fence is signaled. This is needed if, for example, the queue has the last reference to the object that embeds the util_queue_fence. In this case we cannot drop the ref in the main callback, since that would result in the fence being destroyed before it is signaled. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-16 10:00:04 -04:00
Nicolai Hähnle	04d93ea619	radeonsi: disable multi-threading when shader dumps are enabled Otherwise, shader dumps can become interleaved and unusable. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-08 10:59:36 +02:00
Nicolai Hähnle	7ffc832ab8	radeonsi: use multi-threaded compilation in debug contexts We only have to stay single-threaded when debug output must be synchronous. This yields better parallelism in shader-db runs for me. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-08 10:59:32 +02:00
Nicolai Hähnle	d938b8c0bf	radeonsi: explicitly choose center locations for 1xAA on Polaris Unlike SC, the small primitive filter does not automatically use center locations in 1xAA mode, so this is needed to avoid artifacts caused by the small primitive filter discarding triangles that it shouldn't. As a side effect of how the effective number of samples is now calculated, this patch also avoids submitting the sample locations for line/poly smoothing when they're not really needed. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-08 10:52:50 +02:00
Marek Olšák	5c92c21369	radeonsi: do compilation from si_create_shader_selector asynchronously Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	84824935cf	radeonsi: don't lock shader cache mutex during compilation to allow multiple shaders to be compiled simultaneously. ALso, shader-db can again use all 4 cores. v2: Remove the pipe_mutex_unlock call in the error path. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2016-07-05 00:47:13 +02:00
Marek Olšák	850cd953b1	radeonsi: separate the compilation chunk of si_create_shader_selector The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	027ad71b57	radeonsi: print LLVM IRs to ddebug logs Getting LLVM IRs of hanging shaders have never been easier. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	4d1f32376d	radeonsi: don't interpolate colors if flatshading is enabled use v_interp_mov for those Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	4accb02d7a	radeonsi: enable the barycentric optimization in all cases Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	476e9cee1d	radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	a675c6a000	radeonsi: split ps.prolog.force_persample_interp into persp and linear bits This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	eff81cbc81	radeonsi: enable distributed tess on multi-SE parts only ported from Vulkan Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-06-29 16:34:22 +02:00
Marek Olšák	dd56d04568	radeonsi: set optimal VGT_HS_OFFCHIP_PARAM ported from Vulkan Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-06-29 16:34:22 +02:00
Marek Olšák	d5383a7d31	gallium/radeon: use r600_resource_reference Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-06-25 23:13:42 +02:00
Nicolai Hähnle	1167905c41	radeonsi: use trapezoid distribution for tess on Fiji and Polaris This yields a small performance improvement in Unigine Heaven. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-06-20 18:29:55 +02:00
Bas Nieuwenhuizen	e9d3246a7a	radeonsi: Don't offset OFFCHIP_BUFFERING on pre-VI cards. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96239 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-30 09:59:50 +02:00
Marek Olšák	43550f25ed	radeonsi: always reserve output space for tess factors Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Dave Airlie <airlied@redhat.com>	2016-05-27 21:40:43 +02:00
Bas Nieuwenhuizen	43d7305a40	radeonsi: Allow TES distribution between shader engines. The R_028B50_VGT_TESS_DISTRIBUTION value is copied from amdgpu-pro. Smaller values in the ACCUM fields seem to decrease the performance advantage from this patch, higher values don't seem to matter. v2: Add distribution mode field enums. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	fee3160af9	radeonsi: Enable dynamic HS. This allows running the TES on different CU's than the TCS which results in performance improvements. v2: Only write the control word from one invocation. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	6217716e8f	radeonsi: Store inputs to memory when not using a TCS. We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	5c34562d7c	radeonsi: Add offchip tessellation parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	d27ff7d683	radeonsi: Add buffer for offchip storage between TCS and TES. The buffer is quite large, but should only be allocated if the application uses tessellation. Most non-games don't. v2: - Use the correct register for SI. - Add define for block size. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Axel Davy	fc3533c088	radeonsi: Change default behaviour for undefined COLOR0 d3d 9 needs COLOR0 to be 1.0 on all channels when undefined. 0.0 for the others is fine. GL behaviour is undefined. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-18 23:37:14 +02:00
Nicolai Hähnle	d8f3e8e626	radeonsi: always allocate export memory for pixel shaders Experiments with framebuffer-no-attachments type draw calls have shown that NULL exports stall terribly unless we ensure that export memory is allocated by the SPI. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-09 11:52:46 -05:00
Nicolai Hähnle	b9e6e8e7d4	radeonsi: fix undefined behavior (memcpy arguments must be non-NULL) Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-07 16:46:59 -05:00
Marek Olšák	c9e5a7df61	gallium: remove helpers converting to/from TGSI_PROCESSOR_* Acked-by: Jose Fonseca <jfonseca@vmware.com>	2016-04-22 01:30:39 +02:00

... 3 4 5 6 7 ...

601 Commits