KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Marek Olšák	4eb377d1c3	radeonsi: add si_vs_prolog_bits::unpack_instance_id_from_vertex_id:1 The prim discard compute shader bakes InstanceID into the output index buffer. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:10:07 -04:00
Marek Olšák	ccfcb9d818	ac: rename SI-CIK-VI to GFX6-GFX7-GFX8 Acked-by: Dave Airlie <airlied@redhat.com> We already use GFX9 and I don't want us to have confusing naming in the driver. GFXn naming is better from the driver perspective, because it's the real version of the gfx portion of the hw. Also, CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI. It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have nothing to do with GFXn and they have their own version numbers.	2019-05-15 20:54:10 -04:00
Nicolai Hähnle	d814c21b1b	radeonsi: overhaul the vertex fetch fixup mechanism The overall goal is to support unaligned loads from vertex buffers natively on SI. In the unaligned case, we fall back to the general case implementation in ac_build_opencoded_load_format. Since this function is fully general, we will also use it going forward for cases requiring fully manual format conversions of dwords anyway. This requires a different encoding of the fix_fetch array, which will now contain the entire format information if a fixup is required. Having to check the alignment of vertex buffers is awkward. To keep the impact on the fast path minimal, the si_context will keep track of which vertex buffers are (not) at least dword-aligned, while the si_vertex_elements will note which vertex buffers have some (at most dword) alignment requirement. Vertex buffers should be dword-aligned most of the time, which allows a fast early-out in almost all cases. Add the radeonsi_vs_fetch_always_opencode configuration variable for testing purposes. Note that it can only be used reliably on LLVM >= 9, because support for byte and short load is required. v2: - add a missing check to si_bind_vertex_elements Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 17:07:23 +02:00
Timothy Arceri	a004e95dd7	radeonsi/nir: create si_nir_opts() helper We will make use of this in the following commit. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-01 09:41:07 +10:00
Marek Olšák	501ff90a95	radeonsi: rename r600_resource -> si_resource Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 13:32:18 -05:00
Timothy Arceri	2817a4ec0b	radeonsi: remove unrequired param in si_nir_scan_tess_ctrl() Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-02 10:01:15 +11:00
Samuel Pitoiset	3fbdcd942f	amd: remove support for LLVM 6.0 User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-06 14:02:56 +01:00
Sonny Jiang	084cf3b966	radeonsi:optimizing SET_CONTEXT_REG for shaders vgt_vertex_reuse Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 19:04:13 -04:00
Sonny Jiang	ce1d72609d	radeonsi:optimizing SET_CONTEXT_REG for shaders Tessellation Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 19:04:13 -04:00
Sonny Jiang	4052624398	radeonsi:optimizing SET_CONTEXT_REG for shaders GS Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 19:04:13 -04:00
Marek Olšák	c5442c1165	radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI	2018-08-29 15:31:42 -04:00
Marek Olšák	86b52d4236	radeonsi: reduce LDS stalls by 40% for tessellation 40% is the decrease in the LGKM counter (which includes SMEM too) for the GFX9 LSHS stage. This will make the LDS size slightly larger, but I wasn't able to increase the patch stride without corruption, so I'm increasing the vertex stride.	2018-07-23 20:23:52 -04:00
Timothy Pearson	e1621fda84	radeonsi: Use signed char for color_interp_vgpr_index color_interp_vgpr_index was declared as a generic char value. Because signed values are used in this variable, the result was not safe across architectures and crashed on ppc64[el] and arm. Declare color_interp_vgpr_index as a signed type. Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-07-18 13:31:29 -04:00
Dave Airlie	0eb65b4944	radeonsi: rename si_compiler -> ac_llvm_compiler As precursor to moving init to common code, just rename the struct and move it. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-04 05:31:32 +10:00
Marek Olšák	32e413ca59	ac: move all LLVM module initialization into ac_create_module This removes some ugly code around module initialization. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-07-02 14:34:39 -04:00
Marek Olšák	5a6414f135	radeonsi: implement vertex color clamping for tess and GS	2018-06-28 22:41:12 -04:00
Marek Olšák	034b385fc2	radeonsi: move VS_STATE_SGPR before draw SGPRs for vertex color clamping.	2018-06-28 22:27:25 -04:00
Marek Olšák	d77557c9db	radeonsi: store compute local_size into tgsi_shader_info This is kinda a hack, but it's enough for the shader cache.	2018-06-28 22:27:25 -04:00
Marek Olšák	f154555733	radeonsi: clean up passing the is_monolithic flag for compilation Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-06-25 18:33:58 -04:00
Marek Olšák	2f65c67043	radeonsi: fix passing gl_ClipVertex for GS and tess Also add the fprintf call. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-05-25 16:46:00 -04:00
Marek Olšák	a7d61c0753	radeonsi: fix color inputs/outputs for GS and tess GS is tested, tessellation is untested. Have outputs_written_before_ps for HW VS and outputs_written for other stages. The reason is that COLOR and BCOLOR alias for HW VS, which drives elimination of VS outputs based on PS inputs. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-05-25 16:46:00 -04:00
Marek Olšák	e75fc8d033	radeonsi: move data_layout into si_compiler Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	797d673c9a	radeonsi: move passmgr into si_compiler Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	c1823ff661	radeonsi: move target_library_info into si_compiler Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	43f0a10051	radeonsi: add triple into si_compiler Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	87eb597758	radeonsi: add struct si_compiler containing LLVMTargetMachineRef It will contain more variables. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	a8abbbb172	radeonsi: remove r600_pipe_common.h Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Marek Olšák	4c5efc40f4	radeonsi: update copyrights Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-04-05 15:34:58 -04:00
Marek Olšák	2be6143032	radeonsi: implement GL_KHR_blend_equation_advanced MSAA is supported using sample shading. Layered rendering and all texture targets are also supported. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-04-02 13:55:25 -04:00
Marek Olšák	9b7db12815	radeonsi: remove chip_class parameter from si_lower_nir We can get it from si_screen. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>	2018-03-08 14:58:16 -05:00
Timothy Arceri	70190a6567	radeonsi/nir: call ac_lower_indirect_derefs() Fixes piglit tests: tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec3-index-rd.shader_test tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-05 14:09:23 +11:00
Timothy Arceri	561503e3bd	radeonsi: add chip class to compiler_ctx_state This will be used in the following patch. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-05 14:09:23 +11:00
Marek Olšák	8799eaed99	radeonsi: remove 2 unused user SGPRs from merged TES-GS with 32-bit pointers The effect of the last 13 commits on user SGPR counts: Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-26 12:01:19 +01:00
Marek Olšák	3fa7a59d69	radeonsi: make SI_SGPR_VERTEX_BUFFERS the last user SGPR input so that it can be removed and replaced with inline VBO descriptors, and the pointer can be packed in unused bits of VBO descriptors. This also removes the pointer from merged TES-GS where it's useless. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-26 12:01:08 +01:00
Marek Olšák	2d03c4cac8	radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRs TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address aligned to 512KB. Hey, it's a 13-bit pointer! Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-24 23:08:29 +01:00
Marek Olšák	190e064e63	radeonsi: move 2nd-shader descriptor pointers into s[0:1] If 32-bit pointers are supported, both pointers can be moved into s[0:1] and then ESGS has exactly the same user data SGPR declarations as VS. If 32-bit pointers are not supported, only one pointer can be moved into s[0:1]. In that case, the 2nd pointer is moved before TCS constants, so that the location is the same in HS and GS. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-02-24 23:08:29 +01:00
Marek Olšák	931ec80eeb	radeonsi: implement 32-bit pointers in user data SGPRs (v2) User SGPRs changes: VS: 14 -> 9 TCS: 14 -> 10 TES: 10 -> 6 GS: 8 -> 4 GSCOPY: 2 -> 1 PS: 9 -> 5 Merged VS-TCS: 24 -> 16 Merged VS-GS: 18 -> 11 Merged TES-GS: 18 -> 11 SGPRS: 2170102 -> 2158430 (-0.54 %) VGPRS: 1645656 -> 1641516 (-0.25 %) Spilled SGPRs: 9078 -> 8810 (-2.95 %) Spilled VGPRs: 130 -> 114 (-12.31 %) Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread Code Size: 52094872 -> 52692540 (1.15 %) bytes Max Waves: 371848 -> 372723 (0.24 %) v2: - the shader cache needs to take address32_hi into account - set amdgpu-32bit-address-high-bits Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)	2018-02-17 04:52:17 +01:00
Marek Olšák	148b48646b	radeonsi: print shader-db stats for main parts, not final binaries This is needed to get shader-db stats for LS,HS,ES,GS stages on gfx9. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-01-31 03:21:20 +01:00
Marek Olšák	c02c9ee550	radeonsi: move max_simd_waves computation into a separate function Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-01-31 03:21:20 +01:00
Timothy Arceri	452586b56a	radeonsi: add dummy implementation of si_nir_scan_tess_ctrl() Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-01-05 11:58:55 +11:00
Samuel Pitoiset	45872a0a6d	radeonsi: make use of ac_get_spi_shader_z_format() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-14 22:23:25 +01:00
Nicolai Hähnle	239d2b5809	radeonsi: clarify that si_shader_selector::esgs_itemsize is set for the ES part Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-28 09:34:43 +01:00
Nicolai Hähnle	df5ebe0c26	radeonsi/gfx9: fix VM fault with fetched instance divisors We need to account for SGPR locations in merged shaders. This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations Fixes: `79c2e7388c` ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-20 16:26:10 +01:00
Nicolai Hähnle	4f493c79ee	radeonsi: use ready fences on all shaders, not just optimized ones There's a race condition between si_shader_select_with_key and si_bind_XX_shader: Thread 1 Thread 2 -------- -------- si_shader_select_with_key begin compiling the first variant (guarded by sel->mutex) si_bind_XX_shader select first_variant by default as state->current si_shader_select_with_key match state->current and early-out Since thread 2 never takes sel->mutex, it may go on rendering without a PM4 for that shader, for example. The solution taken by this patch is to broaden the scope of shader->optimized_ready to a fence shader->ready that applies to all shaders. This does not hurt the fast path (if anything it makes it faster, because we don't explicitly check is_optimized). It will also allow reducing the scope of sel->mutex locks, but this is deferred to a later commit for better bisectability. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-09 11:37:51 +01:00
Marek Olšák	529cdce799	radeonsi: remove 'Authors:' comments It's inaccurate. Instead, see the copyright and use "git log" and "git blame" to know the authorship. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-02 18:19:03 +01:00
Marek Olšák	da0083f123	radeonsi: use postponed KILL only when derivatives are used Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-24 14:56:34 +02:00
Marek Olšák	2f4705afde	radeonsi: if there's just const buffer 0, set it in place of CONST/SSBO pointer SI_SGPR_CONST_AND_SHADER_BUFFERS now contains the pointer to const buffer 0 if there is no other buffer there. Benefits: - there is no constbuf descriptor upload and shader load It's assumed that all constant addresses are within bounds. Non-constant addresses are clamped against the last declared CONST variable. This only works if the state tracker ensures the bound constant buffer matches what the shader needs. Once we get 32-bit pointers, we can only do this for user constant buffers where the driver is in charge of the upload so that it can guarantee a 32-bit address. The real performance benefit might not be measurable. These apps get 100% theoretical benefit in all shaders (except where noted): - antichamber - barman arkham origins - borderlands 2 - borderlands pre-sequel - brutal legend - civilization BE - CS:GO - deadcore - dota 2 -- most shaders - europa universalis - grid autosport -- most shaders - left 4 dead 2 - legend of grimrock - life is strange - payday 2 - portal - rocket league - serious sam 3 bfe - talos principle - team fortress 2 - thea - unigine heaven - unigine valley -- also sanctuary and tropics - wasteland 2 - xcom: enemy unknown & enemy within - tesseract - unity (engine) Changed stats only: SGPRS: 2059998 -> 2086238 (1.27 %) VGPRS: 1626888 -> 1626904 (0.00 %) Spilled SGPRs: 7902 -> 7865 (-0.47 %) Code Size: 60924520 -> 60982660 (0.10 %) bytes Max Waves: 374539 -> 374526 (-0.00 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-17 22:03:03 +02:00
Marek Olšák	6a8401a94e	radeonsi: add VS blit shader creation no users yet Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Nicolai Hähnle	e4af4433fc	radeonsi: hard-code pixel center for interpolateAtSample without multisample buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:25:45 +02:00
Nicolai HÃÂ¤hnle	92c4277990	radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:25:41 +02:00
Marek Olšák	6eade342eb	radeonsi: optimize TCS epilog when invocation 0 writes tess factors This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 19:02:02 +02:00
Nicolai Hähnle	45c5c44451	radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit `166823bfd2` ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-06 10:02:49 +02:00
Samuel Pitoiset	781a13c475	radeonsi: declare new user SGPR indices for bindless samplers/images A new pair of user SGPR is needed for loading the bindless descriptors from shaders. Because the descriptors are global for all stages, there is no need to add separate indices for GFX9. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 11:34:15 +02:00
Nicolai Hähnle	40697e8678	radeonsi: make si_shader_selector_reference globally visible Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:50:55 +02:00
Nicolai Hähnle	b49c2c9fa3	radeonsi/nir: perform lowering of input/output driver locations Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:40 +02:00
Nicolai Hähnle	29d7bdd179	radeonsi: scan NIR shaders to obtain required info v2: set num_instruction to 2, i.e. 1 + END (Marek) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:32 +02:00
Nicolai Hähnle	90b3ba8970	radeonsi: add si_shader_selector::nir Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:32 +02:00
Marek Olšák	4a10d6154e	radeonsi: move instance divisors into a constant buffer Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 19:55:09 +02:00
Marek Olšák	f9a7e7fe14	radeonsi: use #pragma pack to pack si_shader_key sizeof(struct si_shader_key): Before reverting the 2 commits: 120 bytes After reverting the 2 commits: 128 bytes With #pragma pack: 107 bytes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Marek Olšák	77d2a98353	Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs" This reverts commit `7b2240ac9c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Marek Olšák	dbe45e1180	Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy" This reverts commit `6b6fed3a3c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Emil Velikov	1f958c1337	radeonsi: include ac_binary.h for struct ac_shader_binary The header embeds the struct so it needs the header inclusion instead of the dummy forward declaration. Cc: Nicolai Hähnle <nicolai.haehnle@amd.com> Cc: Marek Olšák <marek.olsak@amd.com> Cc: Tom Stellard <tstellar@redhat.com> Fixes: `32206c5e56` ("radeonsi: Add radeon_shader_binary member to struct si_shader") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-06-17 11:38:02 +01:00
Samuel Pitoiset	2c3a7d5840	radeonsi: track use of bindless samplers/images from tgsi_shader_info This adds some new helper functions to know if the current draw call (or dispatch compute) is using bindless samplers/images, based on TGSI analysis. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Marek Olšák	6b6fed3a3c	radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy The previous patch helps with this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	7b2240ac9c	radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs the next patch will benefit from this Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	1621b33d73	radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	2b7fd9df9a	radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:18 +02:00
Marek Olšák	e9409c86e7	radeonsi: remove 8 bytes from si_shader_key We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:06 +02:00
Marek Olšák	6e2c07749b	radeonsi: move streamout state update out of si_update_shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	53c2ef36da	radeonsi: record which descriptor slots are used by shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	f07c15ef80	radeonsi: merge sampler and image descriptor lists into one Sampler slots: slot[8], .. slot[39] (ascending) Image slots: slot[7], .. slot[0] (descending) Each image occupies 1/2 of each slot, so there are 16 images in total, therefore the layout is: slot[15], .. slot[0]. (in 1/2 slot increments) Updating image slot 2n+i (i <= 1) also dirties and re-uploads slot 2n+!i. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	5df24c3fa6	radeonsi: merge constant and shader buffers descriptor lists into one Constant buffers: slot[16], .. slot[31] (ascending) Shader buffers: slot[15], .. slot[0] (descending) The idea is that if we have 4 constant buffers and 2 shader buffers, we only have to upload 6 slots. That optimization is left for a later commit. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Nicolai Hähnle	a16ae77185	radeonsi: get rid of secondary input/output word By keeping track of fewer generics, everything can fit into 64 bits. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-12 10:46:06 +02:00
Nicolai Hähnle	0dd8aa44b3	radeonsi: reduce the number of generics for shader IO unique indices This is a high as possible while still allowing to merge the bitfields with the next commit. For OpenGL, 32 would be sufficient. Nine apparently uses (much!) higher indices than. Indices that are out of bound don't hurt for VS-PS pipelines, except that the VS output kill optimization is not applied. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-12 10:46:06 +02:00
Nicolai Hähnle	7091fe887b	radeonsi: use SI_MAX_IO_GENERIC instead of magic values Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-12 10:46:04 +02:00
Nicolai Hähnle	0282214c72	radeonsi: more const qualifiers in shader dump functions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-10 08:58:39 +02:00
Nicolai Hähnle	cb2ac69628	radeonsi: split per-patch from per-vertex indices Make it a bit clearer that the index spaces are logically seperate by having them defined in different functions. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-08 17:42:17 +02:00
Marek Olšák	a47289f8fc	radeonsi: remove unused parameters from si_shader_apply_scratch_relocs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	7660c9ee4e	radeonsi: make si_compile_llvm static Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-05 00:23:44 +02:00
Marek Olšák	7a515a607c	radeonsi: don't load unused compute shader input SGPRs and VGPRs Basically, don't load GRID_SIZE or BLOCK_SIZE if they are unused, determine whether to load BLOCK_ID for each component separately, and set the number of THREAD_ID VGPRs to load. Now we should get the maximum CS launch wave rate in most cases. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:57:44 +02:00
Marek Olšák	4e50062028	radeonsi: pass tessellation ring addresses via user SGPRs This removes s_load_dword latency for tess rings. We need just 1 SGPR for the address if we use 64K alignment. The final asm for recreating the descriptor is: // s2 is (address >> 16) s_mov_b32 s3, 0 s_lshl_b64 s[4:5], s[2:3], 16 s_mov_b32 s6, -1 s_mov_b32 s7, 0x27fac v2: bitcast the descriptor type from v2i64 to v4i32 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	9fd9a7d0ba	radeonsi: remove VS epilog code, compile VS with PrimID export on demand The use of PrimID in the pixel shader is too rare to deserve such a sizable support code. The initial idea of the VS epilog was to move the clipping code there and remove it based on states, but optimized variants are now used to do that and are easier to support, so the VS epilog has turned out to be not so useful. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	3b2e93e472	radeonsi: get InstanceID from VGPR1 (or VGPR2 for tess) instead of VGPR3 VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	808c33f6f0	radeonsi: explain (non-)monolithic shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	ed9a51cd3b	radeonsi/gfx9: 2nd shader of merged shaders should hold a reference of the 1st Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	ef40937854	radeonsi: add reference counting for shader selectors The 2nd shader of merged shaders should take a reference of the 1st shader. The next commit will do that. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	2857b14bba	radeonsi/gfx9: always compile monolithic ES-GS (asynchronously) In addition to the non-monolithic variant. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	a82398a8f5	radeonsi/gfx9: add support for monolithic ES-GS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	8b220877ad	radeonsi/gfx9: set registers and shader key for merged ES-GS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	ab197ad8d1	radeonsi/gfx9: add GS user SGPRs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	dcea7e5d19	radeonsi: add si_shader::prolog2 For a GS prolog in merged ES-GS. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	eb35238ffe	radeonsi/gfx9: move RW_BUFFERS to s[0:1] for merged shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	0af00f179e	radeonsi/gfx9: add support for monolithic merged LS-HS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	f11ced475e	radeonsi/gfx9: add VS prolog support for merged LS-HS HS input VGPRs must be reserved. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	852ea69a2d	radeonsi: assign VS/TCS/TES/GS shader input parameter locations dynamically They will vary with merged stages. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	067dacd1b1	radeonsi/gfx9: define and set LS-HS user SGPRs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	62abdb17bb	radeonsi/gfx9: add initial code generation for non-monolithic merged LS-HS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	a98c9ba580	radeonsi/gfx9: add si_shader::previous_stage for merged shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	cfb0798bb3	radeonsi/gfx9: enlarge num_input_sgprs in shader keys due to higher hw limit Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Marek Olšák	4ab36e0ebc	radeonsi/gfx9: update the summary of shader stage configs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-28 21:47:35 +02:00
Dave Airlie	e2659176ce	radeonsi/ac: move vertex export remove to common code. This code can be shared by radv, we bump the max to VARYING_SLOT_MAX here, but that shouldn't have too much fallout. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-27 05:17:47 +01:00
Marek Olšák	96b0cfc82e	radeonsi: turn si_shader_key::mono into a non-union A merged LS-HS shader needs both fix_fetch and inputs_to_copy for compilation. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-26 13:08:05 +02:00
Marek Olšák	bd2cde0c25	radeonsi: add si_shader_selector::vs_needs_prolog cleanup Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-17 01:22:11 +02:00
Nicolai Hähnle	4f7e3fbb50	radeonsi: fix gl_BaseVertex in non-indexed draws gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the way they're implemented, the VGT always generates indices starting at 0, and the VS prolog adds the start index. There's a VGT_INDX_OFFSET register which causes the VGT to start at a driver-defined index. However, this register cannot be written from indirect draws. So fix this unlikely case by setting a bit to tell the VS whether the draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly when used. Fixes a bug in KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:31:11 +02:00
Nicolai Hähnle	472c84d1ad	radeonsi: provide VS_STATE input to all VS variants v2: fix incorrect change in get_tcs_out_patch_stride Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:30:20 +02:00
Nicolai Hähnle	3b9fbcb3b6	radeonsi: change the bit-packing of LS out/TCS in data Avoid conflicts when merging various VS state bits. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:30:19 +02:00
Nicolai Hähnle	ff39f0d59c	radeonsi: emit VS_STATE register explicitly from si_draw_vbo We will merge other derived state information into this register. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:30:18 +02:00
Timothy Arceri	2efddc63ee	gallium/util: replace pipe_mutex with mtx_t pipe_mutex was made unnecessary with `fd33a6bcd7`. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-07 08:48:11 +11:00
Timothy Arceri	69a687189e	radeon/ac: switch from radeon_shader_binary to ac_shader_binary Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-02-28 13:20:31 +11:00
Marek Olšák	84e72f2962	radeonsi: skip TESSINNER/OUTER offchip stores if TES doesn't read them We were unconditionally storing these outputs, sometimes even one component at a time, but apps never read them in TES. Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can easily disable them on demand. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-21 21:27:23 +01:00
Nicolai Hähnle	066a117be7	radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIK The same PS epilog workaround as for 8-bit integer formats is required, since the CB doesn't do clamping. Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-02-21 10:45:13 +01:00
Marek Olšák	22b8a773e1	radeonsi: use SI_MAX_ATTRIBS where it should be used for consistency; no change in behavior Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	054f853035	radeonsi: sort members of si_shader_key::part and improve some comments Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	1fabb29717	radeonsi: have separate LS and ES main shader parts in the shader selector This might reduce the on-demand compilation if the initial VS/LS/ES determination is wrong. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	dbd38f2a92	radeonsi: add a workaround for clamping unaligned RGB 8 & 16-bit vertex loads Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	41a2157a68	radeonsi: make fix_fetch an array of uint8_t so that we can add 3-component fallbacks. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	4c36553a46	radeonsi: implement legacy GL_DOUBLE vertex formats so that we can disable u_vbuf for GL core profiles. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-14 21:47:51 +01:00
Marek Olšák	28c06b3ceb	radeonsi: write shader asm annotated with wave info into GPU hang reports Note that the disassembly is written twice - first the unmodified compiler output and then the wave-annotated output only if there are waves executing the shader. Sample output from a real GPU hang most likely caused by image_sample: The number of active waves = 28 Pixel Shader - annotated disassembly: s_mov_b64 s[6:7], exec ; BE86017E [PC=0x10f3e3800, off=0, size=4] s_wqm_b64 exec, exec ; BEFE077E [PC=0x10f3e3804, off=4, size=4] ... image_sample v[7:9], v[0:1], s[12:19], s[20:23] dmask:0x7 ; F0800700 00A30700 [PC=0x10f3e3a94, off=660, size=8] s_buffer_load_dword s20, s[0:3], 0x50 ; C0220500 00000050 [PC=0x10f3e3a9c, off=668, size=8] s_load_dwordx4 s[24:27], s[4:5], 0x170 ; C00A0602 00000170 [PC=0x10f3e3aa4, off=676, size=8] s_load_dwordx8 s[12:19], s[4:5], 0x140 ; C00E0302 00000140 [PC=0x10f3e3aac, off=684, size=8] s_buffer_load_dword s11, s[0:3], 0x5c ; C02202C0 0000005C [PC=0x10f3e3ab4, off=692, size=8] s_buffer_load_dword s21, s[0:3], 0x54 ; C0220540 00000054 [PC=0x10f3e3abc, off=700, size=8] s_buffer_load_dword s22, s[0:3], 0x58 ; C0220580 00000058 [PC=0x10f3e3ac4, off=708, size=8] s_waitcnt vmcnt(0) ; BF8C0F70 [PC=0x10f3e3acc, off=716, size=4] ^ SE0 SH0 CU1 SIMD1 WAVE0 EXEC=aaaaaaa555aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD2 WAVE0 EXEC=aaaa85555555552a INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD3 WAVE0 EXEC=000000000000000a INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD1 WAVE0 EXEC=25a5a5aa82aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD3 WAVE0 EXEC=50aaaa8fffa55555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE0 EXEC=5554aaaaaaa1a555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE1 EXEC=aaaa5555ffffffff INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD1 WAVE0 EXEC=555557aaaaaaaaa5 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD3 WAVE0 EXEC=5555aaaaaaaaaa85 INST32=BF8C0F70 ^ SE1 SH0 CU3 SIMD1 WAVE0 EXEC=aaaaaaaaaaaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD0 WAVE0 EXEC=aaaaaaaa5a5a5a5a INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD1 WAVE0 EXEC=aaaaaaa5a5a5a4a5 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD2 WAVE0 EXEC=5555555000000000 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD3 WAVE0 EXEC=aa555554155aaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD0 WAVE0 EXEC=55ffff55555555aa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD1 WAVE0 EXEC=555555555aaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD2 WAVE0 EXEC=a0aaaaaaa8555555 INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD3 WAVE0 EXEC=8aaaaaaaaaaaa555 INST32=BF8C0F70 ^ SE1 SH0 CU6 SIMD0 WAVE0 EXEC=000000002aaaaaaa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD0 WAVE0 EXEC=5aaaa5400aaaa15a INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD1 WAVE0 EXEC=00aaaaaaaa5555aa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD2 WAVE0 EXEC=aa00005555554555 INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD3 WAVE0 EXEC=aaaaaaa000000000 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD0 WAVE0 EXEC=5555aaaaaaaaaaaa INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD2 WAVE0 EXEC=ffaaaaaaaaaa5555 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD3 WAVE0 EXEC=aaaa55555555aa00 INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD0 WAVE0 EXEC=00aaaaaaaaaaaa5a INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD1 WAVE0 EXEC=5a555555005555ff INST32=BF8C0F70 v_mul_f32_e32 v7, s6, v7 ; 0A0E0E06 [PC=0x10f3e3ad0, off=720, size=4] ... Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	35cd7551a4	radeonsi: use the correct target machine when building shader variants If the shader selector is created with a different context than the shader variant, we should use the calling context's target machine for the shader variant. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99419 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-18 19:51:31 +01:00
Marek Olšák	3ae3be6dd4	radeonsi: move shader pipe context state into a separate structure Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-18 19:51:31 +01:00
Marek Olšák	d523415609	radeonsi: implement GL_FIXED vertex format Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-16 18:07:08 +01:00
Marek Olšák	018fb2ecb3	radeonsi: implement 32-bit SNORM/UNORM/SSCALED/USCALED vertex formats Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-16 18:07:08 +01:00
Marek Olšák	44e9b67229	radeonsi: make fix_fetch 64-bit v2: add u_bit_consecutive64 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-16 18:07:08 +01:00
Marek Olšák	6f356d15be	radeonsi: cleanly communicate whether si_shader_dump should check R600_DEBUG Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-09 12:01:30 +01:00
Marek Olšák	72d48fcd8e	radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips All codepaths are handled except for clover. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-12-01 02:16:51 +01:00
Marek Olšák	274fb601c2	radeonsi: count and report temp arrays in scratch separately v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2016-11-29 23:52:31 +01:00
Marek Olšák	ef6c84b301	radeonsi: eliminate VS outputs that aren't used by PS at runtime A past commit added the ability to compile "optimized" shader variants asynchronously (not stalling the app). This commit builds upon that and adds what is basically a runtime shader linker. If a VS output isn't used by the currently-bound PS, a new VS compilation is started without that output. The new shader variant is used when it's ready. All apps using separate shader objects I've seen had unused VS outputs. Eliminating unused/useless VS outputs also eliminates the corresponding vertex attribute loads. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	7e76f9a7a8	radeonsi: record information about all written and read varyings It's just tgsi_shader_info with DEFAULT_VAL varyings removed. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	ed3190b3f3	radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled This is the first user of optimized monolithic shader variants. Cull distances can't be disabled by states. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	d984a324bf	radeonsi: add infrastr. for compiling optimized shader variants asynchronously Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	fee71fec25	radeonsi: simplify checking for monolithic compilation Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	6d5c2a8b5c	radeonsi: split the shader key into 3 logical parts key->part.: prolog and epilog flags only key->as_{ls,es}: special flags key->mono.: flags for monolithic compilation only Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Nicolai Hähnle	2c875158e2	radeonsi: fix vertex fetches for 2_10_10_10 formats The hardware always treats the alpha channel as unsigned, so add a shader workaround. This is rare enough that we'll just build a monolithic vertex shader. The SINT case cannot actually happen in OpenGL, but I've included it for completeness since it's just a mix of the other cases. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-04 21:30:18 +01:00
Nicolai Hähnle	908f92ad1f	radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and others. This leaves the case of triangle strips with adjacency and primitive restarts open. It seems that the only thing that cares about that is a piglit test. Fixing this efficiently would be really involved, and I don't want to use the hammer of degrading to software handling of indices because there may well be software that uses this draw mode (without caring about the precise rotation of triangles). v2: - skip the GS prolog entirely if workaround is not needed - only check for TES (TES is always non-null when tessellation is used) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:11:24 +01:00
Nicolai Hähnle	3b2516721b	radeonsi: make the GS copy shader owned by the GS selector The copy shader only depends on the selector. This change avoids creating separate code paths for monolithic vs. non-monolithic geometry shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:50 +01:00
Marek Olšák	3ec9975555	radeonsi: eliminate trivial constant VS outputs These constant value VS PARAM exports: - 0,0,0,0 - 0,0,0,1 - 1,1,1,0 - 1,1,1,1 can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports can be removed from the IR to save export & parameter memory. After LLVM optimizations, analyze the IR to see which exports are equal to the ones listed above (or undef) and remove them if they are. Targeted use cases: - All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them are unused by PS (such as Witcher 2 below). - VS output arrays with unused elements that the GLSL compiler can't eliminate (such as Batman below). The shader-db deltas are quite interesting: (not from upstream si-report.py, it won't be upstreamed) PERCENTAGE DELTAS Shaders PARAM exports (affected only) batman_arkham_origins 589 -67.17 % bioshock-infinite 1769 -0.47 % dirt-showdown 548 -2.68 % dota2 1747 -3.36 % f1-2015 776 -4.94 % left_4_dead_2 1762 -0.07 % metro_2033_redux 2670 -0.43 % portal 474 -0.22 % talos_principle 324 -3.63 % warsow 176 -2.20 % witcher2 1040 -73.78 % ---------------------------------------- All affected 991 -65.37 % ... 9681 -> 3353 ---------------------------------------- Total 26725 -10.82 % ... 58490 -> 52162 v2: treat Undef as both 0 and 1 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)	2016-10-19 22:21:46 +02:00
Marek Olšák	7dddf0b7ab	radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settings The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 19:00:51 +02:00
Nicolai Hähnle	77c81164bc	radeonsi: support ARB_compute_variable_group_size Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:36:42 +02:00
Marek Olšák	3388f27d84	radeonsi: clean up lucky #include dependencies Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-04 16:12:06 +02:00
Marek Olšák	275c073c6a	radeonsi: export SampleMask from pixel shaders at full rate Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-09-13 20:38:25 +02:00
Nicolai Hähnle	8dbf2a8570	radeonsi: add DRAWID parameter to vertex shaders Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-09 15:56:04 +02:00
Marek Olšák	1e5f00f9d5	radeonsi: pre-generate shader logs for ddebug This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-26 23:06:46 +02:00
Marek Olšák	dd66f9d3e7	radeonsi: move the shader key dumping to si_shader_dump Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-26 23:06:46 +02:00
Marek Olšák	0f7a6ea5e7	radeonsi: report accurate SGPR and VGPR spills Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	5c92c21369	radeonsi: do compilation from si_create_shader_selector asynchronously Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	850cd953b1	radeonsi: separate the compilation chunk of si_create_shader_selector The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	4d1f32376d	radeonsi: don't interpolate colors if flatshading is enabled use v_interp_mov for those Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	4accb02d7a	radeonsi: enable the barycentric optimization in all cases Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	476e9cee1d	radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	a675c6a000	radeonsi: split ps.prolog.force_persample_interp into persp and linear bits This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00

1 2 3 4 5 ...

381 Commits