KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Marek Olšák	275c073c6a	radeonsi: export SampleMask from pixel shaders at full rate Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-09-13 20:38:25 +02:00
Nicolai Hähnle	8dbf2a8570	radeonsi: add DRAWID parameter to vertex shaders Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-08-09 15:56:04 +02:00
Marek Olšák	1e5f00f9d5	radeonsi: pre-generate shader logs for ddebug This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-26 23:06:46 +02:00
Marek Olšák	dd66f9d3e7	radeonsi: move the shader key dumping to si_shader_dump Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-26 23:06:46 +02:00
Marek Olšák	0f7a6ea5e7	radeonsi: report accurate SGPR and VGPR spills Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	5c92c21369	radeonsi: do compilation from si_create_shader_selector asynchronously Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	850cd953b1	radeonsi: separate the compilation chunk of si_create_shader_selector The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	4d1f32376d	radeonsi: don't interpolate colors if flatshading is enabled use v_interp_mov for those Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	4accb02d7a	radeonsi: enable the barycentric optimization in all cases Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	476e9cee1d	radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	a675c6a000	radeonsi: split ps.prolog.force_persample_interp into persp and linear bits This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Nicolai Hähnle	b42bc90b6a	radeonsi: enable WQM in PS prolog when needed WQM is needed when the PS prolog computes a VGPR that is consumed by a shader with (implicit or explicit) derivatives. Depends on http://reviews.llvm.org/D20839 / LLVM r272063 for this to be effective (otherwise it's just a no-op). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Cc: 12.0 <mesa-dev@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-06-07 23:46:02 +02:00
Bas Nieuwenhuizen	26f436132b	radeonsi: Remove LDS layout user SGPR's from TES. They are unused. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	6217716e8f	radeonsi: Store inputs to memory when not using a TCS. We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	c49e68dc4b	radeonsi: Add user SGPR for the layout of the offchip buffer. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	d9a0c54f6f	radeonsi: Use correct parameter index for LS_OUT_LAYOUT. This happens to be in the right position, but that changes when TCS/TES get new parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	5c34562d7c	radeonsi: Add offchip tessellation parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Marek Olšák	3cbd8cfc7a	radeonsi: decrease GS copy shader user SGPRs to 2 const buffers are no longer used since the clip plane const buffer was moved to RW buffers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-22 01:14:14 +02:00
Marek Olšák	3138a28ff2	radeonsi: move default tess level constant buffer to RW buffers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-22 01:14:14 +02:00
Bas Nieuwenhuizen	38f4cee3ff	radeonsi: Add config parameter to si_shader_apply_scratch_relocs. shader->config is not updated for compute kernels. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2016-04-21 19:36:19 +02:00
Bas Nieuwenhuizen	84a6761ae3	radeonsi: add shared memory Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	753a3e472b	radeonsi: lower compute shader arguments Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-19 18:10:30 +02:00
Marek Olšák	ed66c75784	radeonsi: use enums in si_shader.h Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-18 19:51:25 +02:00
Nicolai Hähnle	c495c0ad37	radeonsi: implement set_shader_buffers Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-12 16:30:26 -05:00
Nicolai Hähnle	e85cf35a65	radeonsi: implement set_shader_images (v2) Whether DCC is disabled depends on the access flags with which the image is bound: image_load supports DCC, but store and atomic don't. v2: remove an unnecessary masking of images->desc.enabled_mask Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-03-21 15:34:23 -05:00
Marek Olšák	74b4ce81fb	radeonsi: allow dumping shader disassemblies to a file Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-03-01 00:18:54 +01:00
Marek Olšák	d0f3b524cd	radeonsi: use re-Z This can increase perf for shaders that kill pixels (kill, alpha-test, alpha-to-coverage). v2: add comments Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-03-01 00:18:19 +01:00
Marek Olšák	ff360a52e6	radeonsi: implement binary shaders & shader cache in memory (v2) v2: handle _mesa_hash_table_insert failure other cosmetic changes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	1fe73d55e3	radeonsi: move some struct si_shader members to new struct si_shader_info This will be part of shader binaries. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	10fa269f4f	radeonsi: use smaller types for some si_shader members in order to decrease the shader size for a shader cache. v2: add & use SI_MAX_VS_OUTPUTS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	3c98e0b369	radeonsi: compile non-GS middle parts of shaders immediately if enabled Still disabled. Only prologs & epilogs are compiled in draw calls, but each variant of those is compiled only once per process. VS is always compiled as hw VS. TES is always compiled as hw VS. LS and ES stages are always compiled on demand. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	4636d9be4a	radeonsi: add PS prolog Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:58 +01:00
Marek Olšák	e79bb746ab	radeonsi: add PS epilog Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	eb10919b83	radeonsi: add TCS epilog Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	e1b21696a3	radeonsi: add VS epilog It only exports the primitive ID. Also used by TES when it's compiled as VS. The VS input location of the primitive ID input is v2. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	70de433dea	radeonsi: add VS prolog This is disabled with use_monolithic_shaders = true. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	19a92886a8	radeonsi: first bits for non-monolithic shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	17eb99d8b9	radeonsi: add code for combining and uploading shaders from 3 shader parts Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	dc27456194	radeonsi: separate out shader key bits for prologs & epilogs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	d995d4830e	radeonsi: compute how many input VGPRs fragment shaders have Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	fe1b6ede01	radeonsi: compute how many input SGPRs and VGPRs shaders have Prologs (shader binaries inserted before the API shader binary) need to know this, so that they won't change the input registers unintentionally. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-21 21:08:57 +01:00
Marek Olšák	7aedbbacae	radeonsi: put image, fmask, and sampler descriptors into one array The texture slot is expanded to 16 dwords containing 2 descriptors. Those can be: - Image and fmask, or - Image and sampler state By carefully choosing the locations, we can put all three into one slot, with the fmask and sampler state being mutually exclusive. This improves shaders in 2 ways: - 2 user SGPRs are unused, shaders can use them as temporary registers now - each pair of descriptors is always on the same cache line v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask at the same time Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-10 19:41:49 +01:00
Marek Olšák	dc5fc3c2f6	radeonsi: make LLVM IR dumping less messy Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	b6d5666fbf	radeonsi: remove useless code that handles dx10_clamp_mode "enable-no-nans-fp-math" is a wrong string and there was a disagreement about fixing it. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	5a53628f45	radeonsi: read SPI_PS_INPUT_ADDR from LLVM if it returns it Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	b9126dcda8	radeonsi: implement forcing per-sample_interpolation using the shader key only It was partly a state and partly emulated by shader code, but since we want to do this in a fragment shader prolog, we need to put it into the shader key, which will be used to generate the prolog. This also removes the spi_ps_input states and moves the registers to the PS state. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	4596f3c1b8	radeonsi: remove si_shader::ps_input_interpolate tgsi_shader_info has this too. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Marek Olšák	6dda2455c8	radeonsi: move BCOLOR PS input locations after all other inputs BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs were offset by 1 if color_two_side was enabled, and not offset if it was not enabled, which is a variation that's problematic if we want to have 1 variant per shader and the variant doesn't care about color_two_side (that should be handled by other bytecode attached at the beginning). Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location "num_inputs" if it's present. BCOLOR1 is next. This also allows removing si_shader::nparam and si_shader::ps_input_param_offset, which are useless now. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-02-09 21:19:51 +01:00
Jan Vesely	efc4142acd	r600,compute: Plug few memory leaks v2: drop inline keyword drop radeon_llvm_dispose_kernel_module wrapper v3: move definitions to .c file use in radeonsi Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2016-01-26 19:04:38 +01:00
Nicolai Hähnle	c55b9499d5	radeonsi: move is_gs_copy_shader to si_shader_context It is only used during shader creation now, so no need to keep it around afterwards. Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-25 10:16:00 -05:00

1 2 3

142 Commits