KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Nicolai HÃÂ¤hnle	92c4277990	radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:25:41 +02:00
Nicolai Hähnle	48b3364b5b	radeonsi: make si_init_shader_selector_async static Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-13 18:24:18 +02:00
Marek Olšák	6eade342eb	radeonsi: optimize TCS epilog when invocation 0 writes tess factors This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 19:02:02 +02:00
Marek Olšák	89bf8668c2	radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HS -44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:07 +02:00
Nicolai Hähnle	45c5c44451	radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit `166823bfd2` ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-06 10:02:49 +02:00
Marek Olšák	c3ebac6890	radeonsi/gfx9: implement primitive binning This increases performance, but it was tuned for Raven, not Vega. We don't know yet how Vega will perform, hopefully not worse. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-05 12:09:02 +02:00
Marek Olšák	fb7ba68f6c	radeonsi: eliminate PS color outputs when colormask kills them Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-04 15:10:39 +02:00
Timothy Arceri	0168d1f449	radeonsi: stop leaking nir Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-29 09:46:29 +10:00
Timothy Arceri	ea2515d780	glsl: pass shader source keys to the disk cache We don't actually write them to disk here. That will happen in the following commit. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-25 13:20:29 +10:00
Marek Olšák	8dadb07790	radeonsi: emit VGT_REUSE_OFF in the right place clip_regs aren't marked dirty when writes_viewport_index is changed. Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Marek Olšák	54c2c771bd	radeonsi/gfx9: don't use GS scenario A for VS writing ViewportIndex Vulkan doesn't do it anymore. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Marek Olšák	a65afda768	radeonsi/gfx9: prevent shader-db crashes - don't precompile LS and ES (they don't exist on GFX9), compile as VS instead - don't precompile HS and GS (we don't have LS and ES parts) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Nicolai Hähnle	40697e8678	radeonsi: make si_shader_selector_reference globally visible Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:50:55 +02:00
Marek Olšák	e887c68bd2	radeonsi: add a separate dirty mask for prefetches so that we don't rely on si_pm4_state_enabled_and_changed, allowing us to move prefetches after draw calls. v2: ckear the dirty mask after unbinding shaders Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2017-08-07 21:12:24 +02:00
Marek Olšák	a7b0014d1a	radeonsi: add and use si_pm4_state_enabled_and_changed Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	58d062b87d	radeonsi: de-atomize L2 prefetch I'd like to be able to move the prefetch call site around. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Nicolai Hähnle	25ff22e390	radeonsi: tweak next-shader assumptions when streamout is used VS with streamout is always a HW VS. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:43 +02:00
Nicolai Hähnle	b49c2c9fa3	radeonsi/nir: perform lowering of input/output driver locations Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:40 +02:00
Nicolai Hähnle	c5f70a5174	radeonsi: bypass the shader cache for NIR shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:33 +02:00
Nicolai Hähnle	29d7bdd179	radeonsi: scan NIR shaders to obtain required info v2: set num_instruction to 2, i.e. 1 + END (Marek) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:32 +02:00
Marek Olšák	ffa7ec9e22	radeonsi: simplify computation of tessellation offchip buffers This is overly cautious, but better safe than sorry. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:55:07 -04:00
Marek Olšák	aaee0d1bbf	gallium: use "ull" number suffix to keep the QtCreator parser happy It can't parse "llu". Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-07-10 22:44:48 +02:00
Marek Olšák	4a10d6154e	radeonsi: move instance divisors into a constant buffer Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 19:55:09 +02:00
Marek Olšák	77d2a98353	Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs" This reverts commit `7b2240ac9c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Marek Olšák	dbe45e1180	Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy" This reverts commit `6b6fed3a3c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Nicolai Hähnle	da2e52b382	radeonsi: use the correct LLVMTargetMachineRef in si_build_shader_variant si_build_shader_variant can actually be called directly from one of normal-priority compiler threads. In that case, the thread_index is only valid for the normal tm array. v2: - use the correct sel/shader->compiler_ctx_state Fixes: `86cc809726` ("radeonsi: use a compiler queue with a low priority for optimized shaders") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-22 09:45:23 +02:00
Samuel Pitoiset	2c3a7d5840	radeonsi: track use of bindless samplers/images from tgsi_shader_info This adds some new helper functions to know if the current draw call (or dispatch compute) is using bindless samplers/images, based on TGSI analysis. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Marek Olšák	e80a056ff9	radeonsi: replace si_vertex_elements::elements with separate fields It makes si_vertex_elements a little smaller. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	6b6fed3a3c	radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy The previous patch helps with this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	7b2240ac9c	radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs the next patch will benefit from this Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	1621b33d73	radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	4b8d0c2b1d	radeonsi: don't update dependent states if it has no effect (v2) This and the previous clip_regs commit decrease IB sizes and the number of si_update_shaders invocations as follows: IB size si_update_shaders calls Borderlands 2 -10% -27% Deus Ex: MD -5% -11% Talos Principle -8% -30% v2: always dirty cb_render_state in set_framebuffer_state Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	bacaceb78a	radeonsi: update clip_regs on shader state changes only when it's needed Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:20 +02:00
Marek Olšák	2b7fd9df9a	radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:18 +02:00
Marek Olšák	140b3c5019	radeonsi: add a new helper si_get_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:16 +02:00
Marek Olšák	e9409c86e7	radeonsi: remove 8 bytes from si_shader_key We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:06 +02:00
Marek Olšák	2b8b9a56ef	radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:14:15 +02:00
Marek Olšák	3effce4fb0	radeonsi/gfx9: prevent a race when the previous shader's main part is missing Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	b5bc826ead	radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	ffbaba6072	radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9 LS is merged into TCS. If there is no TCS, LS is merged into fixed-func TCS. The problem is the fixed-func TCS was ignored by scratch update functions, so LS didn't have the scratch buffer set up. Note that Mesa 17.1 doesn't have merged shaders. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	6e2c07749b	radeonsi: move streamout state update out of si_update_shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	8147c4a4a5	radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_key Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	86cc809726	radeonsi: use a compiler queue with a low priority for optimized shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	38bd468a78	radeonsi: drop unfinished shader compilations when destroying shaders If we enqueue too many jobs and destroy the GL context, it may take several seconds before the jobs finish. Just drop them instead. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	a7f098fb76	radeonsi: only upload (dump to L2) those descriptors that are used by shaders This decreases the size of CE RAM dumps to L2, or the size of descriptor uploads without CE. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	53c2ef36da	radeonsi: record which descriptor slots are used by shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Nicolai Hähnle	4ea67c1751	radeonsi: rename tcs_tes_uses_prim_id for clarity What we care about is whether PrimID is used while tessellation is enabled; whether it's used in TCS/TES or further down the pipeline is irrelevant. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-16 16:11:54 +02:00
Nicolai Hähnle	f4dbe2efb7	radeonsi: fix gl_PrimitiveIDIn in geometry shader when using tessellation This builds on commit `0549ea15ec` ("radeonsi: fix primitive ID in fragment shader when using tessellation"). Fixes piglit arb_tessellation_shader/execution/gs-primitiveid-instanced.shader_test Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-16 16:11:53 +02:00
Nicolai Hähnle	a16ae77185	radeonsi: get rid of secondary input/output word By keeping track of fewer generics, everything can fit into 64 bits. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-12 10:46:06 +02:00
Nicolai Hähnle	cfe6e30f1b	radeonsi: skip generic out/in indices without a shader IO index OpenGL uses at most 32 generic outputs/inputs in any stage, and they always have a shader IO index and therefore fit into the outputs_written/ inputs_read/kill_outputs fields. However, Nine uses semantic indices more liberally. We support that in VS-PS pipelines, except that the optimization of killing outputs must be skipped. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-12 10:46:05 +02:00

1 2 3 4 5 ...

336 Commits