KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Marek Olšák	5a47abb63e	radeonsi: don't change viewport for blits, use window-space positions The viewport state was an identity anyway. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Marek Olšák	13b6c1c031	radeonsi: minor cleanup of si_update_vs_writes_viewport_index Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Marek Olšák	69ccb9dae7	radeonsi: use new VS blit shaders (VS inputs in SGPRs) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Marek Olšák	6a8401a94e	radeonsi: add VS blit shader creation no users yet Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-07 18:26:35 +02:00
Nicolai Hähnle	12f3155e28	radeonsi: simplify the signature of si_update_vs_writes_viewport_index Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-10-02 15:07:45 +02:00
Nicolai Hähnle	7bbcb6ac6c	radeonsi: move current_rast_prim into si_context v2: rebase fixes Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-10-02 15:07:45 +02:00
Nicolai Hähnle	6b416ec3d6	radeonsi: move and rename scissor and viewport state and functions v2: change GET_MAX_SCISSOR to SI_MAX_SCISSOR Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-10-02 15:07:45 +02:00
Nicolai Hähnle	f86a112b07	radeonsi: move current_rast_prim to r600_common_context We'll use it in the scissors / clip / guardband state. v2: avoid a performance regression on r600 when applied to (pre-fork) stable branches Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-10-02 15:07:43 +02:00
Nicolai Hähnle	7dfa891f32	radeonsi/gfx9: fix geometry shaders without output vertices Not that those are super common or useful, but hey! Fun corner cases of the API... Fixes dEQP-GLES31.functional.geometry_shading.emit.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2017-09-29 11:43:09 +02:00
Marek Olšák	06bfb2d28f	r600: fork and import gallium/radeon This marks the end of code sharing between r600 and radeonsi. It's getting difficult to work on radeonsi without breaking r600. A lot of functions had to be renamed to prevent linker conflicts. There are also minor cleanups. Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-26 04:21:14 +02:00
Nicolai Hähnle	aab134cfa5	radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs This does not take commutative blending into account yet. R600_DEBUG=nooutoforder disables it. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2017-09-18 11:25:19 +02:00
Nicolai Hähnle	e4af4433fc	radeonsi: hard-code pixel center for interpolateAtSample without multisample buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:25:45 +02:00
Nicolai HÃÂ¤hnle	92c4277990	radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:25:41 +02:00
Nicolai Hähnle	48b3364b5b	radeonsi: make si_init_shader_selector_async static Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:24:18 +02:00
Marek Olšák	6eade342eb	radeonsi: optimize TCS epilog when invocation 0 writes tess factors This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 19:02:02 +02:00
Marek Olšák	89bf8668c2	radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HS -44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:07 +02:00
Nicolai Hähnle	45c5c44451	radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit `166823bfd2` ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-06 10:02:49 +02:00
Marek Olšák	c3ebac6890	radeonsi/gfx9: implement primitive binning This increases performance, but it was tuned for Raven, not Vega. We don't know yet how Vega will perform, hopefully not worse. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-05 12:09:02 +02:00
Marek Olšák	fb7ba68f6c	radeonsi: eliminate PS color outputs when colormask kills them Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-04 15:10:39 +02:00
Timothy Arceri	0168d1f449	radeonsi: stop leaking nir Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-29 09:46:29 +10:00
Timothy Arceri	ea2515d780	glsl: pass shader source keys to the disk cache We don't actually write them to disk here. That will happen in the following commit. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-25 13:20:29 +10:00
Marek Olšák	8dadb07790	radeonsi: emit VGT_REUSE_OFF in the right place clip_regs aren't marked dirty when writes_viewport_index is changed. Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Marek Olšák	54c2c771bd	radeonsi/gfx9: don't use GS scenario A for VS writing ViewportIndex Vulkan doesn't do it anymore. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Marek Olšák	a65afda768	radeonsi/gfx9: prevent shader-db crashes - don't precompile LS and ES (they don't exist on GFX9), compile as VS instead - don't precompile HS and GS (we don't have LS and ES parts) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Nicolai Hähnle	40697e8678	radeonsi: make si_shader_selector_reference globally visible Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:50:55 +02:00
Marek Olšák	e887c68bd2	radeonsi: add a separate dirty mask for prefetches so that we don't rely on si_pm4_state_enabled_and_changed, allowing us to move prefetches after draw calls. v2: ckear the dirty mask after unbinding shaders Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2017-08-07 21:12:24 +02:00
Marek Olšák	a7b0014d1a	radeonsi: add and use si_pm4_state_enabled_and_changed Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	58d062b87d	radeonsi: de-atomize L2 prefetch I'd like to be able to move the prefetch call site around. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Nicolai Hähnle	25ff22e390	radeonsi: tweak next-shader assumptions when streamout is used VS with streamout is always a HW VS. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:43 +02:00
Nicolai Hähnle	b49c2c9fa3	radeonsi/nir: perform lowering of input/output driver locations Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:40 +02:00
Nicolai Hähnle	c5f70a5174	radeonsi: bypass the shader cache for NIR shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:33 +02:00
Nicolai Hähnle	29d7bdd179	radeonsi: scan NIR shaders to obtain required info v2: set num_instruction to 2, i.e. 1 + END (Marek) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:32 +02:00
Marek Olšák	ffa7ec9e22	radeonsi: simplify computation of tessellation offchip buffers This is overly cautious, but better safe than sorry. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:55:07 -04:00
Marek Olšák	aaee0d1bbf	gallium: use "ull" number suffix to keep the QtCreator parser happy It can't parse "llu". Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-07-10 22:44:48 +02:00
Marek Olšák	4a10d6154e	radeonsi: move instance divisors into a constant buffer Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 19:55:09 +02:00
Marek Olšák	77d2a98353	Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs" This reverts commit `7b2240ac9c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Marek Olšák	dbe45e1180	Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy" This reverts commit `6b6fed3a3c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Nicolai Hähnle	da2e52b382	radeonsi: use the correct LLVMTargetMachineRef in si_build_shader_variant si_build_shader_variant can actually be called directly from one of normal-priority compiler threads. In that case, the thread_index is only valid for the normal tm array. v2: - use the correct sel/shader->compiler_ctx_state Fixes: `86cc809726` ("radeonsi: use a compiler queue with a low priority for optimized shaders") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-22 09:45:23 +02:00
Samuel Pitoiset	2c3a7d5840	radeonsi: track use of bindless samplers/images from tgsi_shader_info This adds some new helper functions to know if the current draw call (or dispatch compute) is using bindless samplers/images, based on TGSI analysis. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Marek Olšák	e80a056ff9	radeonsi: replace si_vertex_elements::elements with separate fields It makes si_vertex_elements a little smaller. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	6b6fed3a3c	radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy The previous patch helps with this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	7b2240ac9c	radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs the next patch will benefit from this Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	1621b33d73	radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	4b8d0c2b1d	radeonsi: don't update dependent states if it has no effect (v2) This and the previous clip_regs commit decrease IB sizes and the number of si_update_shaders invocations as follows: IB size si_update_shaders calls Borderlands 2 -10% -27% Deus Ex: MD -5% -11% Talos Principle -8% -30% v2: always dirty cb_render_state in set_framebuffer_state Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	bacaceb78a	radeonsi: update clip_regs on shader state changes only when it's needed Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:20 +02:00
Marek Olšák	2b7fd9df9a	radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:18 +02:00
Marek Olšák	140b3c5019	radeonsi: add a new helper si_get_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:16 +02:00
Marek Olšák	e9409c86e7	radeonsi: remove 8 bytes from si_shader_key We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:06 +02:00
Marek Olšák	2b8b9a56ef	radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:14:15 +02:00
Marek Olšák	3effce4fb0	radeonsi/gfx9: prevent a race when the previous shader's main part is missing Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00

1 2 3 4 5 ...

348 Commits