KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Marek Olšák	96b0cfc82e	radeonsi: turn si_shader_key::mono into a non-union A merged LS-HS shader needs both fix_fetch and inputs_to_copy for compilation. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-26 13:08:05 +02:00
Marek Olšák	bd2cde0c25	radeonsi: add si_shader_selector::vs_needs_prolog cleanup Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-17 01:22:11 +02:00
Nicolai Hähnle	4f7e3fbb50	radeonsi: fix gl_BaseVertex in non-indexed draws gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the way they're implemented, the VGT always generates indices starting at 0, and the VS prolog adds the start index. There's a VGT_INDX_OFFSET register which causes the VGT to start at a driver-defined index. However, this register cannot be written from indirect draws. So fix this unlikely case by setting a bit to tell the VS whether the draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly when used. Fixes a bug in KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:31:11 +02:00
Nicolai Hähnle	472c84d1ad	radeonsi: provide VS_STATE input to all VS variants v2: fix incorrect change in get_tcs_out_patch_stride Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:30:20 +02:00
Nicolai Hähnle	3b9fbcb3b6	radeonsi: change the bit-packing of LS out/TCS in data Avoid conflicts when merging various VS state bits. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:30:19 +02:00
Nicolai Hähnle	ff39f0d59c	radeonsi: emit VS_STATE register explicitly from si_draw_vbo We will merge other derived state information into this register. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-13 17:30:18 +02:00
Timothy Arceri	2efddc63ee	gallium/util: replace pipe_mutex with mtx_t pipe_mutex was made unnecessary with `fd33a6bcd7`. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-07 08:48:11 +11:00
Timothy Arceri	69a687189e	radeon/ac: switch from radeon_shader_binary to ac_shader_binary Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-02-28 13:20:31 +11:00
Marek Olšák	84e72f2962	radeonsi: skip TESSINNER/OUTER offchip stores if TES doesn't read them We were unconditionally storing these outputs, sometimes even one component at a time, but apps never read them in TES. Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can easily disable them on demand. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-21 21:27:23 +01:00
Nicolai Hähnle	066a117be7	radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIK The same PS epilog workaround as for 8-bit integer formats is required, since the CB doesn't do clamping. Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-02-21 10:45:13 +01:00
Marek Olšák	22b8a773e1	radeonsi: use SI_MAX_ATTRIBS where it should be used for consistency; no change in behavior Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	054f853035	radeonsi: sort members of si_shader_key::part and improve some comments Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	1fabb29717	radeonsi: have separate LS and ES main shader parts in the shader selector This might reduce the on-demand compilation if the initial VS/LS/ES determination is wrong. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	dbd38f2a92	radeonsi: add a workaround for clamping unaligned RGB 8 & 16-bit vertex loads Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	41a2157a68	radeonsi: make fix_fetch an array of uint8_t so that we can add 3-component fallbacks. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-18 01:22:08 +01:00
Marek Olšák	4c36553a46	radeonsi: implement legacy GL_DOUBLE vertex formats so that we can disable u_vbuf for GL core profiles. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-14 21:47:51 +01:00
Marek Olšák	28c06b3ceb	radeonsi: write shader asm annotated with wave info into GPU hang reports Note that the disassembly is written twice - first the unmodified compiler output and then the wave-annotated output only if there are waves executing the shader. Sample output from a real GPU hang most likely caused by image_sample: The number of active waves = 28 Pixel Shader - annotated disassembly: s_mov_b64 s[6:7], exec ; BE86017E [PC=0x10f3e3800, off=0, size=4] s_wqm_b64 exec, exec ; BEFE077E [PC=0x10f3e3804, off=4, size=4] ... image_sample v[7:9], v[0:1], s[12:19], s[20:23] dmask:0x7 ; F0800700 00A30700 [PC=0x10f3e3a94, off=660, size=8] s_buffer_load_dword s20, s[0:3], 0x50 ; C0220500 00000050 [PC=0x10f3e3a9c, off=668, size=8] s_load_dwordx4 s[24:27], s[4:5], 0x170 ; C00A0602 00000170 [PC=0x10f3e3aa4, off=676, size=8] s_load_dwordx8 s[12:19], s[4:5], 0x140 ; C00E0302 00000140 [PC=0x10f3e3aac, off=684, size=8] s_buffer_load_dword s11, s[0:3], 0x5c ; C02202C0 0000005C [PC=0x10f3e3ab4, off=692, size=8] s_buffer_load_dword s21, s[0:3], 0x54 ; C0220540 00000054 [PC=0x10f3e3abc, off=700, size=8] s_buffer_load_dword s22, s[0:3], 0x58 ; C0220580 00000058 [PC=0x10f3e3ac4, off=708, size=8] s_waitcnt vmcnt(0) ; BF8C0F70 [PC=0x10f3e3acc, off=716, size=4] ^ SE0 SH0 CU1 SIMD1 WAVE0 EXEC=aaaaaaa555aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD2 WAVE0 EXEC=aaaa85555555552a INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD3 WAVE0 EXEC=000000000000000a INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD1 WAVE0 EXEC=25a5a5aa82aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD3 WAVE0 EXEC=50aaaa8fffa55555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE0 EXEC=5554aaaaaaa1a555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE1 EXEC=aaaa5555ffffffff INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD1 WAVE0 EXEC=555557aaaaaaaaa5 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD3 WAVE0 EXEC=5555aaaaaaaaaa85 INST32=BF8C0F70 ^ SE1 SH0 CU3 SIMD1 WAVE0 EXEC=aaaaaaaaaaaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD0 WAVE0 EXEC=aaaaaaaa5a5a5a5a INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD1 WAVE0 EXEC=aaaaaaa5a5a5a4a5 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD2 WAVE0 EXEC=5555555000000000 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD3 WAVE0 EXEC=aa555554155aaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD0 WAVE0 EXEC=55ffff55555555aa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD1 WAVE0 EXEC=555555555aaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD2 WAVE0 EXEC=a0aaaaaaa8555555 INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD3 WAVE0 EXEC=8aaaaaaaaaaaa555 INST32=BF8C0F70 ^ SE1 SH0 CU6 SIMD0 WAVE0 EXEC=000000002aaaaaaa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD0 WAVE0 EXEC=5aaaa5400aaaa15a INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD1 WAVE0 EXEC=00aaaaaaaa5555aa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD2 WAVE0 EXEC=aa00005555554555 INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD3 WAVE0 EXEC=aaaaaaa000000000 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD0 WAVE0 EXEC=5555aaaaaaaaaaaa INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD2 WAVE0 EXEC=ffaaaaaaaaaa5555 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD3 WAVE0 EXEC=aaaa55555555aa00 INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD0 WAVE0 EXEC=00aaaaaaaaaaaa5a INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD1 WAVE0 EXEC=5a555555005555ff INST32=BF8C0F70 v_mul_f32_e32 v7, s6, v7 ; 0A0E0E06 [PC=0x10f3e3ad0, off=720, size=4] ... Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	35cd7551a4	radeonsi: use the correct target machine when building shader variants If the shader selector is created with a different context than the shader variant, we should use the calling context's target machine for the shader variant. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99419 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-18 19:51:31 +01:00
Marek Olšák	3ae3be6dd4	radeonsi: move shader pipe context state into a separate structure Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-18 19:51:31 +01:00
Marek Olšák	d523415609	radeonsi: implement GL_FIXED vertex format Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-16 18:07:08 +01:00
Marek Olšák	018fb2ecb3	radeonsi: implement 32-bit SNORM/UNORM/SSCALED/USCALED vertex formats Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-16 18:07:08 +01:00
Marek Olšák	44e9b67229	radeonsi: make fix_fetch 64-bit v2: add u_bit_consecutive64 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-16 18:07:08 +01:00
Marek Olšák	6f356d15be	radeonsi: cleanly communicate whether si_shader_dump should check R600_DEBUG Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-09 12:01:30 +01:00
Marek Olšák	72d48fcd8e	radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips All codepaths are handled except for clover. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-12-01 02:16:51 +01:00
Marek Olšák	274fb601c2	radeonsi: count and report temp arrays in scratch separately v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2016-11-29 23:52:31 +01:00
Marek Olšák	ef6c84b301	radeonsi: eliminate VS outputs that aren't used by PS at runtime A past commit added the ability to compile "optimized" shader variants asynchronously (not stalling the app). This commit builds upon that and adds what is basically a runtime shader linker. If a VS output isn't used by the currently-bound PS, a new VS compilation is started without that output. The new shader variant is used when it's ready. All apps using separate shader objects I've seen had unused VS outputs. Eliminating unused/useless VS outputs also eliminates the corresponding vertex attribute loads. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	7e76f9a7a8	radeonsi: record information about all written and read varyings It's just tgsi_shader_info with DEFAULT_VAL varyings removed. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	ed3190b3f3	radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled This is the first user of optimized monolithic shader variants. Cull distances can't be disabled by states. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	d984a324bf	radeonsi: add infrastr. for compiling optimized shader variants asynchronously Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	fee71fec25	radeonsi: simplify checking for monolithic compilation Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Marek Olšák	6d5c2a8b5c	radeonsi: split the shader key into 3 logical parts key->part.: prolog and epilog flags only key->as_{ls,es}: special flags key->mono.: flags for monolithic compilation only Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-21 21:44:35 +01:00
Nicolai Hähnle	2c875158e2	radeonsi: fix vertex fetches for 2_10_10_10 formats The hardware always treats the alpha channel as unsigned, so add a shader workaround. This is rare enough that we'll just build a monolithic vertex shader. The SINT case cannot actually happen in OpenGL, but I've included it for completeness since it's just a mix of the other cases. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-04 21:30:18 +01:00
Nicolai Hähnle	908f92ad1f	radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and others. This leaves the case of triangle strips with adjacency and primitive restarts open. It seems that the only thing that cares about that is a piglit test. Fixing this efficiently would be really involved, and I don't want to use the hammer of degrading to software handling of indices because there may well be software that uses this draw mode (without caring about the precise rotation of triangles). v2: - skip the GS prolog entirely if workaround is not needed - only check for TES (TES is always non-null when tessellation is used) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:11:24 +01:00
Nicolai Hähnle	3b2516721b	radeonsi: make the GS copy shader owned by the GS selector The copy shader only depends on the selector. This change avoids creating separate code paths for monolithic vs. non-monolithic geometry shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:50 +01:00
Marek Olšák	3ec9975555	radeonsi: eliminate trivial constant VS outputs These constant value VS PARAM exports: - 0,0,0,0 - 0,0,0,1 - 1,1,1,0 - 1,1,1,1 can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports can be removed from the IR to save export & parameter memory. After LLVM optimizations, analyze the IR to see which exports are equal to the ones listed above (or undef) and remove them if they are. Targeted use cases: - All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them are unused by PS (such as Witcher 2 below). - VS output arrays with unused elements that the GLSL compiler can't eliminate (such as Batman below). The shader-db deltas are quite interesting: (not from upstream si-report.py, it won't be upstreamed) PERCENTAGE DELTAS Shaders PARAM exports (affected only) batman_arkham_origins 589 -67.17 % bioshock-infinite 1769 -0.47 % dirt-showdown 548 -2.68 % dota2 1747 -3.36 % f1-2015 776 -4.94 % left_4_dead_2 1762 -0.07 % metro_2033_redux 2670 -0.43 % portal 474 -0.22 % talos_principle 324 -3.63 % warsow 176 -2.20 % witcher2 1040 -73.78 % ---------------------------------------- All affected 991 -65.37 % ... 9681 -> 3353 ---------------------------------------- Total 26725 -10.82 % ... 58490 -> 52162 v2: treat Undef as both 0 and 1 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)	2016-10-19 22:21:46 +02:00
Marek Olšák	7dddf0b7ab	radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settings The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 19:00:51 +02:00
Nicolai Hähnle	77c81164bc	radeonsi: support ARB_compute_variable_group_size Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:36:42 +02:00
Marek Olšák	3388f27d84	radeonsi: clean up lucky #include dependencies Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-04 16:12:06 +02:00
Marek Olšák	275c073c6a	radeonsi: export SampleMask from pixel shaders at full rate Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-09-13 20:38:25 +02:00
Nicolai Hähnle	8dbf2a8570	radeonsi: add DRAWID parameter to vertex shaders Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-09 15:56:04 +02:00
Marek Olšák	1e5f00f9d5	radeonsi: pre-generate shader logs for ddebug This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-26 23:06:46 +02:00
Marek Olšák	dd66f9d3e7	radeonsi: move the shader key dumping to si_shader_dump Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-26 23:06:46 +02:00
Marek Olšák	0f7a6ea5e7	radeonsi: report accurate SGPR and VGPR spills Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	5c92c21369	radeonsi: do compilation from si_create_shader_selector asynchronously Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	850cd953b1	radeonsi: separate the compilation chunk of si_create_shader_selector The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	4d1f32376d	radeonsi: don't interpolate colors if flatshading is enabled use v_interp_mov for those Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	4accb02d7a	radeonsi: enable the barycentric optimization in all cases Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	476e9cee1d	radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	a675c6a000	radeonsi: split ps.prolog.force_persample_interp into persp and linear bits This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Nicolai Hähnle	b42bc90b6a	radeonsi: enable WQM in PS prolog when needed WQM is needed when the PS prolog computes a VGPR that is consumed by a shader with (implicit or explicit) derivatives. Depends on http://reviews.llvm.org/D20839 / LLVM r272063 for this to be effective (otherwise it's just a no-op). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Cc: 12.0 <mesa-dev@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-06-07 23:46:02 +02:00
Bas Nieuwenhuizen	26f436132b	radeonsi: Remove LDS layout user SGPR's from TES. They are unused. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	6217716e8f	radeonsi: Store inputs to memory when not using a TCS. We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	c49e68dc4b	radeonsi: Add user SGPR for the layout of the offchip buffer. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	d9a0c54f6f	radeonsi: Use correct parameter index for LS_OUT_LAYOUT. This happens to be in the right position, but that changes when TCS/TES get new parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	5c34562d7c	radeonsi: Add offchip tessellation parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Marek Olšák	3cbd8cfc7a	radeonsi: decrease GS copy shader user SGPRs to 2 const buffers are no longer used since the clip plane const buffer was moved to RW buffers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-22 01:14:14 +02:00
Marek Olšák	3138a28ff2	radeonsi: move default tess level constant buffer to RW buffers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-22 01:14:14 +02:00
Bas Nieuwenhuizen	38f4cee3ff	radeonsi: Add config parameter to si_shader_apply_scratch_relocs. shader->config is not updated for compute kernels. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2016-04-21 19:36:19 +02:00
Bas Nieuwenhuizen	84a6761ae3	radeonsi: add shared memory Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	753a3e472b	radeonsi: lower compute shader arguments Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-19 18:10:30 +02:00
Marek Olšák	ed66c75784	radeonsi: use enums in si_shader.h Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-18 19:51:25 +02:00
Nicolai Hähnle	c495c0ad37	radeonsi: implement set_shader_buffers Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-12 16:30:26 -05:00
Nicolai Hähnle	e85cf35a65	radeonsi: implement set_shader_images (v2) Whether DCC is disabled depends on the access flags with which the image is bound: image_load supports DCC, but store and atomic don't. v2: remove an unnecessary masking of images->desc.enabled_mask Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-03-21 15:34:23 -05:00
Marek Olšák	74b4ce81fb	radeonsi: allow dumping shader disassemblies to a file Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-03-01 00:18:54 +01:00
Marek Olšák	d0f3b524cd	radeonsi: use re-Z This can increase perf for shaders that kill pixels (kill, alpha-test, alpha-to-coverage). v2: add comments Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-03-01 00:18:19 +01:00
Marek Olšák	ff360a52e6	radeonsi: implement binary shaders & shader cache in memory (v2) v2: handle _mesa_hash_table_insert failure other cosmetic changes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	1fe73d55e3	radeonsi: move some struct si_shader members to new struct si_shader_info This will be part of shader binaries. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	10fa269f4f	radeonsi: use smaller types for some si_shader members in order to decrease the shader size for a shader cache. v2: add & use SI_MAX_VS_OUTPUTS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	3c98e0b369	radeonsi: compile non-GS middle parts of shaders immediately if enabled Still disabled. Only prologs & epilogs are compiled in draw calls, but each variant of those is compiled only once per process. VS is always compiled as hw VS. TES is always compiled as hw VS. LS and ES stages are always compiled on demand. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	4636d9be4a	radeonsi: add PS prolog Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	e79bb746ab	radeonsi: add PS epilog Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	eb10919b83	radeonsi: add TCS epilog Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	e1b21696a3	radeonsi: add VS epilog It only exports the primitive ID. Also used by TES when it's compiled as VS. The VS input location of the primitive ID input is v2. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	70de433dea	radeonsi: add VS prolog This is disabled with use_monolithic_shaders = true. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	19a92886a8	radeonsi: first bits for non-monolithic shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	17eb99d8b9	radeonsi: add code for combining and uploading shaders from 3 shader parts Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	dc27456194	radeonsi: separate out shader key bits for prologs & epilogs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	d995d4830e	radeonsi: compute how many input VGPRs fragment shaders have Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	fe1b6ede01	radeonsi: compute how many input SGPRs and VGPRs shaders have Prologs (shader binaries inserted before the API shader binary) need to know this, so that they won't change the input registers unintentionally. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	7aedbbacae	radeonsi: put image, fmask, and sampler descriptors into one array The texture slot is expanded to 16 dwords containing 2 descriptors. Those can be: - Image and fmask, or - Image and sampler state By carefully choosing the locations, we can put all three into one slot, with the fmask and sampler state being mutually exclusive. This improves shaders in 2 ways: - 2 user SGPRs are unused, shaders can use them as temporary registers now - each pair of descriptors is always on the same cache line v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask at the same time Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-10 19:41:49 +01:00
Marek Olšák	dc5fc3c2f6	radeonsi: make LLVM IR dumping less messy Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	b6d5666fbf	radeonsi: remove useless code that handles dx10_clamp_mode "enable-no-nans-fp-math" is a wrong string and there was a disagreement about fixing it. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	5a53628f45	radeonsi: read SPI_PS_INPUT_ADDR from LLVM if it returns it Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	b9126dcda8	radeonsi: implement forcing per-sample_interpolation using the shader key only It was partly a state and partly emulated by shader code, but since we want to do this in a fragment shader prolog, we need to put it into the shader key, which will be used to generate the prolog. This also removes the spi_ps_input states and moves the registers to the PS state. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	4596f3c1b8	radeonsi: remove si_shader::ps_input_interpolate tgsi_shader_info has this too. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	6dda2455c8	radeonsi: move BCOLOR PS input locations after all other inputs BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs were offset by 1 if color_two_side was enabled, and not offset if it was not enabled, which is a variation that's problematic if we want to have 1 variant per shader and the variant doesn't care about color_two_side (that should be handled by other bytecode attached at the beginning). Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location "num_inputs" if it's present. BCOLOR1 is next. This also allows removing si_shader::nparam and si_shader::ps_input_param_offset, which are useless now. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Jan Vesely	efc4142acd	r600,compute: Plug few memory leaks v2: drop inline keyword drop radeon_llvm_dispose_kernel_module wrapper v3: move definitions to .c file use in radeonsi Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2016-01-26 19:04:38 +01:00
Nicolai Hähnle	c55b9499d5	radeonsi: move is_gs_copy_shader to si_shader_context It is only used during shader creation now, so no need to keep it around afterwards. Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-25 10:16:00 -05:00
Marek Olšák	99dfeb01bd	radeonsi: disable SPI color outputs the shader doesn't write Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-22 15:02:40 +01:00
Marek Olšák	f1f0158837	radeonsi: add shader conversion code for all SPI color formats Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-22 15:02:40 +01:00
Marek Olšák	8667a1aea2	radeonsi: use SPI_SHADER_COL_FORMAT fields instead of export_16bpc This does change the behavior slightly: If a shader writes COLOR[i] and that color buffer isn't bound, the shader will export MRT_NULL instead and discard the IR tree that calculates the output. The only exception is alpha-to-coverage, which requires an alpha export. v2: - update a comment about 16BPC - account for MRTZ when when fixing alpha-test/kill Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-22 15:02:40 +01:00
Marek Olšák	bca18057a3	radeonsi: adjust the parameters of si_shader_dump The function will be extended to dump all binaries shaders will consist of, so si_shader* makes sense here. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	b0df5f4c19	radeonsi: inline si_shader_binary_read Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	c9c031f3d0	radeonsi: move si_shader_dump call out of si_shader_binary_read Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	ccd7d7e13d	radeonsi: add si_shader_destroy_binary Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	5c9f104567	radeonsi: don't pass si_shader to si_compile_llvm Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	63345cfc3a	radeonsi: don't pass si_shader to si_shader_binary_read Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	2d3a96448a	radeonsi: don't pass si_shader to si_shader_binary_read_config Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	20b9b5d7f5	radeonsi: add struct si_shader_config There will be 1 config per variant, which will be a union of configs from {prolog, main, epilog}. For now, just add the structure. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	e00f3f23b1	radeonsi: set SPI color formats and CB_SHADER_MASK outside of compilation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	746a7a7498	radeonsi: determine SPI_SHADER_Z_FORMAT outside of shader compilation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:05 +01:00
Marek Olšák	2cb8bf90cd	radeonsi: determine DB_SHADER_CONTROL outside of shader compilation because the API pixel shader binary will not emulate alpha test one day, so the KILL_ENABLE bit must be determined elsewhere. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:05 +01:00
Marek Olšák	86fa48426c	radeonsi: remove unused parameter from si_shader_binary_read_config Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-03 22:41:16 +01:00
Marek Olšák	b6d95248f0	radeonsi: move si_shader_binary_upload out of si_shader_binary_read Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-03 22:41:16 +01:00
Marek Olšák	fd7000bd78	radeonsi: pass TGSI processor type to si_shader_binary_read for dumping the parameter will be used later Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-03 22:41:16 +01:00
Marek Olšák	3ce0a2fd7f	radeonsi: pass TGSI processor type to si_compile_llvm for dumping the parameter will be used later Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-03 22:41:16 +01:00
Marek Olšák	dd79034ca6	radeonsi: rename shader parameter definitions and variables for more clarity Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-03 22:41:16 +01:00
Nicolai Hähnle	4bb1c8dfec	radeonsi: pass pipe_debug_callback down into si_shader_binary_read (v2) This will allow us to send shader debug info. Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-02 16:47:23 -05:00
Marek Olšák	51603af390	radeonsi: use tgsi_shader_info::colors_written Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2015-12-11 15:25:11 +01:00
Tom Stellard	95e0510916	radeonsi: Rename si_shader::ls_rsrc{1,2} to si_shader::rsrc{1,2} In the future, these will be used by other shaders types. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-11-25 11:03:05 -05:00
Marek Olšák	3694d58e6c	radeonsi: remove dead code after ES-GS linkage change Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-13 19:54:42 +01:00
Marek Olšák	d79a3449a7	radeonsi: link ES-GS just like LS-HS This reduces the shader key for ES. Use a fixed attrib location based on (semantic name, index). The ESGS item size is determined by the physical index of the highest ES output, so it's almost always larger than before, but I think that shouldn't matter as long as the ESGS ring buffer is large enough. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-13 19:54:42 +01:00
Marek Olšák	b1c5f3faa9	radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga I discovered that increasing the ESGS ring size fixes GS hangs on Tonga, so let's do it properly. There is now a separate init_config_gs_rings state that is not immutable, because GS rings are resized when needed. This also saves some memory. Most apps won't need more than 1MB per ring per shader engine. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2015-11-13 19:54:42 +01:00
Marek Olšák	4acd856088	radeonsi: calculate ESGS_RING_ITEMSIZE in create_shader Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2015-11-13 19:54:42 +01:00
Marek Olšák	a0cf589961	radeonsi: move maximum gs stream calculation into create_shader Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2015-11-13 19:54:42 +01:00
Marek Olšák	3ab0c49f04	radeonsi: clean up small duplication in si_shader_gs Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2015-11-13 19:54:42 +01:00
Marek Olšák	38391835b5	radeonsi: fix the export_prim_id field size in the shader key Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-20 12:56:40 +02:00
Marek Olšák	9b54ce3362	radeonsi: support thread-safe shaders shared by multiple contexts The "current" shader pointer is moved from the CSO to the context, so that the CSO is mostly immutable. The only drawback is that the "current" pointer isn't saved when unbinding a shader and it must be looked up when the shader is bound again. This is also a prerequisite for multithreaded shader compilation. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-20 12:51:51 +02:00
Marek Olšák	5bc871a4ca	radeonsi: implement vertex color clamping This is only supported in the compatibility profile (without GS and tess). Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-17 21:40:03 +02:00
Marek Olšák	208d1ed38d	radeonsi: implement fragment color clamping using the shader key for now. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-17 21:40:03 +02:00
Marek Olšák	c4f086f399	radeonsi: remove an unused ctx parameter in si_shader_destroy Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-17 21:40:03 +02:00
Marek Olšák	b3c55fc669	radeonsi: do force_persample_interp in shaders for non-trivial cases Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-03 22:06:09 +02:00
Marek Olšák	c2a42d1f9f	radeonsi: don't rebind GSVS ring buffers every draw call using GS Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com>	2015-09-01 21:51:14 +02:00
Marek Olšák	f6a10f60b7	radeonsi: optimize scissor states - convert 16 states to 1 atom - only emit 1 scissor if VIEWPORT_INDEX isn't written - use only one packet when emitting consecutive scissors Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com>	2015-09-01 21:51:13 +02:00
Marek Olšák	9b510a9652	radeonsi: fix a Unigine Heaven hang when drirc is missing Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com>	2015-09-01 21:51:13 +02:00
Marek Olšák	93d97db349	radeonsi: allow si_dump_key to write to a file Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>	2015-08-26 19:25:18 +02:00
Marek Olšák	e7a52a5cb8	radeonsi: add support for gl_PrimitiveID in the fragment shader It must be obtained from the VS. The GS scenario A must be enabled for PrimID to be generated for the VS. + 4 piglits Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-08-13 01:25:26 +02:00
Dave Airlie	4b6c1efb22	radeonsi: split out interpolation input selection This is prep work for using it in the interpolation code later. Also add storage for the input interpolation mode so we can pick it up later. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-07-25 01:06:41 +01:00
Marek Olšák	ebfd9e0071	radeonsi: add tessellation shader states ls_rsrc# will be emitted as part of the derived tessellation state Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-07-23 00:59:32 +02:00
Marek Olšák	aa2fa6723a	radeonsi: update si_get_vs_info and si_get_vs_state for tessellation Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-07-23 00:59:32 +02:00
Marek Olšák	fff16e4ad2	radeonsi: add shader code generation for tessellation Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-07-23 00:59:32 +02:00
Marek Olšák	2ecb06b946	radeonsi: make ES2GS offset sgpr location dynamic It will have a different location in the tessellation evaluation shader. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-07-23 00:59:31 +02:00
Marek Olšák	50a957c5de	radeonsi: upload shader rodata after updating scratch relocations Cc: 10.5 10.6 <mesa-stable@lists.freedesktop.org> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-07-23 00:59:24 +02:00
Marek Olšák	e4d738f6c6	radeonsi: remove redundant parameter in si_shader_binary_read Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-07-23 00:59:23 +02:00
Michel Dänzer	248b26429f	radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs This eliminates the error prone logic in si_shader_vs recalculating this value. It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being counted for VS exports. They need to be counted because they are passed to the pixel shader as parameters as well. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-07-07 12:35:35 +09:00
Dave Airlie	556dd4af76	radeonsi: add support for geometry shader invocations. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-06-27 00:24:30 +01:00
Michel Dänzer	d64adc3a79	radeonsi: Cache LLVMTargetMachineRef in context instead of in screen Fixes a crash in genymotion with several threads compiling shaders concurrently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746 Cc: 10.5 <mesa-stable@lists.freedesktop.org> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-30 15:15:10 +09:00
Marek Olšák	303d23e10d	radeonsi: add shader code for smoothing The fragment shader multiplies the alpha channel with gl_SampleMaskIn. If blending is enabled, it looks like MSAA. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Tom Stellard	bbfa1c3239	radeonsi/compute: Use value from compiler for COMPUTE_PGM_RSRC1.FLOAT_MODE Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-09 13:53:33 +00:00
Marek Olšák	6c5af1dc4e	radeonsi: implement polygon stippling Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-04 14:34:13 +01:00
Tom Stellard	2397a72129	radeonsi: Enable VGPR spilling for all shader types v5 v2: - Only emit write SPI_TMPRING_SIZE once per packet. - Use context global scratch buffer. v3: - Patch shaders using WRITE_DATA packet instead of map/unmap. - Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and VS_PARTIAL_FLUSH when patching shaders. v4: - Code cleanups. - Remove unnecessary multiplies. v5: - Patch shaders in system memory and re-upload to vram. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-28 21:03:47 +00:00
Tom Stellard	32206c5e56	radeonsi: Add radeon_shader_binary member to struct si_shader Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-28 21:03:46 +00:00
Marek Olšák	1829f9c928	radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shaders v2: complete rewrite Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-01-07 18:27:54 +01:00
Marek Olšák	15a7fff69a	radeonsi: remove flatshade from the shader key Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	161534737c	radeonsi: get info about VS outputs from tgsi_shader_info Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-12-10 21:59:37 +01:00
Tom Stellard	1f4e48d5b5	radeonsi/compute: Enable PIPE_SHADER_IR_NATIVE for compute shaders v2 v2: - Drop dependency on LLVM >= 3.5.1 - Rename si_create_shader() to si_shader_binary_read()	2014-10-31 15:24:00 -04:00
Marek Olšák	8067732740	radeonsi: remove shader->input[] and output[] arrays and dependencies They were reinventing tgsi_shader_info. They are unused now. radeon_llvm_context::load_input can be NULL if input fetching is implemented in some other way. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-12 23:53:57 +02:00
Marek Olšák	8b057ddaea	radeonsi: move param_offset out of shader->input[] and output[] Those are going away. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-12 23:53:57 +02:00
Marek Olšák	fa933438a2	radeonsi: use tgsi_shader_info in si_shader_ps Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-12 23:53:54 +02:00
Marek Olšák	34e8200599	radeonsi: don't recompile shaders when changing nr_cbufs from 0 to 1 Both cases are equivalent. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-12 23:52:07 +02:00
Marek Olšák	5e0fbe1b63	radeonsi: remove vs.ucps_enabled from the shader key Written CLIPDIST outputs are simply disabled in PA_CL_VS_OUT_CNTL. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-12 23:52:02 +02:00
Marek Olšák	a9592cd3ac	radeonsi: assume ClipDistance usage mask is always 0xf No code in Mesa sets the usage mask to any other value. The final mask is AND'ed with enable bits from the rasterizer state anyway. If somebody implements setting usage masks in st/mesa, we can use tgsi_shader_info to get it more easily. This is a prerequisite for the following commit. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-12 23:51:44 +02:00
Marek Olšák	1f6c0b55df	radeonsi: set number of userdata SGPRs of GS copy shader to 4 It only needs the constant buffer with clip planes and read-write resources for the GS->VS ring and streamout. That's 2 pointers. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:15 +02:00
Marek Olšák	91f1a79f78	radeonsi: make the vertex shader key smaller We only support 16 vertex attribs, not 32. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	10e386f4aa	radeonsi: remove interp_at_sample from the key, use TGSI_INTERPOLATE_LOC_SAMPLE st/mesa has the same flag in its shader key, we don't need to do it in the driver anymore. Instead, use TGSI_INTERPOLATE_LOC_SAMPLE, which is what st/mesa sets. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	0a2d6f0c4e	radeonsi: move geometry shader properties from si_shader to si_shader_selector Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	6c9f61c97e	radeonsi: remove unused variable si_shader::gs_input_prim Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	8860584045	radeonsi: get fs_write_all from tgsi_shader_info directly Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	5233568861	radeonsi: get tgsi_shader_info only once before compilation Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	1abb1a97b0	radeonsi: don't pass the context to the shader translator This should prevent accessing context state there. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-09-24 14:48:02 +02:00
Marek Olšák	e29353ff20	radeonsi: don't snoop currently-bound GS shader when compiling ES Instead, pass the layout of GS inputs in memory to the ES using the shader key. Only 64 bits are needed to represent the layout in the key. Mixing and matching different VS and GS shaders should now always work. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-09-24 14:48:02 +02:00
Marek Olšák	2774abd4ce	radeonsi: shorten si_pipe_* prefixes to si_* This was the original naming convention in r600g and it somehow crept into radeonsi. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-09-24 14:48:02 +02:00
Marek Olšák	8c37c16cbc	radeonsi: merge si_pipe_shader into si_shader One is part of the other anyway. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-09-24 14:48:02 +02:00
Marek Olšák	dba4c5baf4	radeonsi: move DB_SHADER_CONTROL into db_render_state I will need this for fixing sample shading with 1 sample. The good news is that all shader pm4 states no longer use the current context state, so we can generate the pm4 states outside of draw_vbo if needed. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-09-24 14:48:02 +02:00
Marek Olšák	adc5797f54	radeonsi: set KILL_ENABLE during shader compilation, remove uses_kill flag Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-09-24 14:48:02 +02:00
Marek Olšák	a34c9f70b1	radeonsi: remove shader.ps_conservative_z, set db_shader_control instead Also set the field on SI too. It's not just specific to CIK. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-09-24 14:48:02 +02:00
Marek Olšák	a768b43bc3	radeonsi: remove unused variable si_pipe_shader::sprite_coord_enable Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-09-24 14:48:02 +02:00
Tom Stellard	b0f780345e	radeonsi/compute: Add support scratch buffer support v2 The scratch buffer will be used for private memory and also register spilling. v2: - Code cleanups	2014-07-21 10:00:09 -04:00
Marek Olšák	09056b352d	radeonsi: use an SGPR instead of VGT_INDX_OFFSET The draw indirect packets cannot set VGT_INDX_OFFSET, they can only set user data SGPRs. This is the only way to support start/index_bias with indirect drawing. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-07-18 01:58:58 +02:00
Marek Olšák	6a2b38381e	radeonsi: pass ARB_conservative_depth parameters to the hardware Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-06-19 00:17:36 +02:00
Marek Olšák	99df120e00	radeonsi: interpolate varyings at sample when full sample shading is enabled	2014-06-02 12:58:22 +02:00
Marek Olšák	d9e102b220	radeonsi: prepare depth export registers at compile time Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-05-10 13:58:46 +02:00
Michel Dänzer	f8e16010e5	radeonsi: Put GS ring buffer descriptors with streamout buffer descriptors And mark the constant buffers as read only for the GPU again. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-29 11:09:26 +09:00
Michel Dänzer	8afde9fa23	radeonsi: Take GS into account for VS state in more places Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-29 11:07:35 +09:00
Michel Dänzer	d8b3d806fc	radeonsi: Handle TGSI_SEMANTIC_PRIMID Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-29 11:07:11 +09:00
Michel Dänzer	7c7d7380f1	radeonsi: Generalize counting of shader parameters Now it covers ES->GS as well as VS->PS. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-29 11:06:58 +09:00
Michel Dänzer	404b29d765	radeonsi: Initial geometry shader support Partly based on the corresponding r600g work by Vadim Girlin and Dave Airlie. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-29 11:06:28 +09:00
Andreas Hartmetz	8662e66bf2	radeonsi: Rename the commonly occurring rctx/r600 variables. The "r" stands for R600. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-14 00:07:14 +01:00
Andreas Hartmetz	238aeabce0	radeonsi: Rename r600->si for structs in si_pipe.h. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-14 00:07:13 +01:00
Andreas Hartmetz	786af2f963	radeonsi: Apply si_* file naming scheme. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-01-14 00:07:13 +01:00

... 3 4 5 6 7 ...

380 Commits