KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Samuel Pitoiset	39a35eb0c1	radeonsi: try to re-use previously deleted bindless descriptor slots Currently, when the array is full it is resized but it can grow over and over because we don't try to re-use descriptor slots. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 11:34:37 +02:00
Samuel Pitoiset	c2dfa9b111	radeonsi: use slot indexes for bindless handles Using VRAM address as bindless handles is not a good idea because we have to use LLVMIntToPTr and the LLVM CSE pass can't optimize because it has no information about the pointer. Instead, use slots indexes like the existing descriptors. Note that we use fixed 16-dword slots for both samplers and images. This doesn't really matter because no real apps use image handles. This improves performance with DOW3 by +7%. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 11:34:29 +02:00
Samuel Pitoiset	50349f404d	radeonsi: add si_emit_global_shader_pointers() helper To share common code between rw buffers and bindless descriptors. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 11:34:24 +02:00
Samuel Pitoiset	a5ff4a8e2e	radeonsi: only initialize dirty_mask when CE is used Looks like it's useless to initialize that field when CE is unused. This will also allow to declare more than 64 elements for the array of bindless descriptors. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 11:34:23 +02:00
Samuel Pitoiset	a29ef75565	radeonsi: make some si_descriptors fields 32-bit The number of bindless descriptors is dynamic and we definitely have to support more than 256 slots. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 11:34:21 +02:00
Samuel Pitoiset	781a13c475	radeonsi: declare new user SGPR indices for bindless samplers/images A new pair of user SGPR is needed for loading the bindless descriptors from shaders. Because the descriptors are global for all stages, there is no need to add separate indices for GFX9. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 11:34:15 +02:00
Nicolai Hähnle	472c906d9f	radeonsi/gfx9: add performance counters Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:55:16 +02:00
Nicolai Hähnle	e271607668	radeonsi: extract common code of si_upload_{graphics,compute}_shader_descriptors Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:55:05 +02:00
Nicolai Hähnle	a6e7693882	gallium: remove unused PIPE_DUMP_* defines Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:53:35 +02:00
Nicolai Hähnle	f4c1d5a76d	radeonsi: emit string markers to log context Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:53:34 +02:00
Nicolai Hähnle	0c3f8aca7f	radeonsi: log decompress blits Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:53:34 +02:00
Nicolai Hähnle	420c438589	radeonsi: log draw and compute state into log context Also add missing trace emits and CS logging for compute launches. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:53:34 +02:00
Nicolai HÃ¤hnle	4c3f36ec6b	radeonsi: print saved CS to the log context Use the auto logger facility, so that CS chunks will be interleaved with other log info. v2: - fix some crashes when not using CE - fix skipping "previous" chunks of current (unflushed) IB - fix error handling in si_begin_cs_debug Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:53:14 +02:00
Nicolai Hähnle	bc93339799	radeonsi: start using u_log_context for debugging Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:51:00 +02:00
Nicolai Hähnle	ad33f2ddd8	radeonsi: re-order debug state dumping Keep together the parts that won't use the deferred logging mechanism. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:50:57 +02:00
Nicolai Hähnle	40697e8678	radeonsi: make si_shader_selector_reference globally visible Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:50:55 +02:00
Nicolai Hähnle	4bbf6ded20	radeonsi: add reference count to si_compute To allow keep-alive for deferred logging. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:50:53 +02:00
Nicolai Hähnle	bbaad18c04	radeonsi: implement pipe_context::set_log_context We'll add radeonsi-specific code to set_log_context in later patches, but we may want to log from common code. Hence keep the log pointer in r600_common_context. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-22 09:50:48 +02:00
Marek Olšák	db039d67aa	radeonsi: don't prefetch VBO descriptors if vertex elements == NULL Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-21 23:06:42 +02:00
Rob Herring	4734bfc02a	Android: Fix LLVM duplicated symbols linking for N and M Both statically linking libLLVMCore and dynamically linking libLLVM causes duplicated symbols in gallium_dri.so and it fails to dlopen. We don't really need to link libLLVMCore, but just need generated headers to be built first. Dynamically linking to libLLVM instead is enough to do that. Thanks to Qiang Yu for finding the root cause. With this change, we can align all versions and just have libLLVM as a shared lib dependency. This also requires changes in the M and N versions of LLVM to export the include paths for libLLVM. AOSP master is okay. Fixes: `26aee6f4d5` ("Android: rework LLVM build support") Reported-by: Mauro Rossi <issor.oruam@gmail.com> Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Qiang Yu <Qiang.Yu@amd.com> Signed-off-by: Rob Herring <robh@kernel.org>	2017-08-21 10:46:21 -05:00
Leo Liu	7319ff8787	radeon/uvd: add YUYV format support for target buffer Make chroma plane optional for YUYV support Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-08-21 10:09:09 -04:00
Samuel Pitoiset	2843c5d15c	radeonsi: update non-resident bindless descriptors if needed Only resident bindless descriptors are currently updated and re-uploaded, this makes sure that the non-resident ones are also updated. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-21 15:23:56 +02:00
Marek Olšák	57fb1bb585	gallium/radeon: remove old_fence parameter from r600_gfx_write_event_eop just use the new scratch buffer. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-18 16:06:21 +02:00
Marek Olšák	41e053954d	radeonsi/gfx9: prevent a GPU hang after a timestamp event Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-18 16:06:18 +02:00
Marek Olšák	13aa8d3da9	radeonsi: don't use CLEAR_STATE on SI This fixes random hangs with Unigine Valley. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102201 Fixes: `064550238e` ("radeonsi: use CLEAR_STATE to initialize some registers") Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-18 15:59:22 +02:00
Marek Olšák	1ab7fed707	radeonsi: disable CE by default It makes performance worse by a very small (hard to measure) amount. We've done extensive profiling of this feature internally. Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Christian König <christian.koenig@amd.com>	2017-08-15 15:03:43 +02:00
Marek Olšák	d1285a7103	radeonsi/gfx9: fix the scissor bug workaround otherwise there is corruption in most apps. Fixes: `0fe0320` radeonsi: use optimal packet order when doing a pipeline sync Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-11 20:38:29 +02:00
Marek Olšák	27fef5d52d	radeonsi/gfx9: use the VI codepath for clamping Z This fixes corrupted shadows in Unigine Valley. The corruption disappeared when I stopped setting IMG_DATA_FORMAT_24_8 for depth. Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-11 20:38:29 +02:00
Marek Olšák	4630ede102	ac: fail shader compilation if libelf is replaced by an incompatible version UE4Editor has this issue. This commit prevents hangs (release build) or assertion failures (debug build). It doesn't fix the editor, but catastrophic scenarios are prevented. Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-08-10 13:24:23 +02:00
Samuel Pitoiset	bbfad34606	radeonsi: drop two unused variables in create_function() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-09 12:56:00 +02:00
Marek Olšák	a2703fc119	radeonsi: fix a compile failure due to disabled asserts	2017-08-07 22:51:45 +02:00
Marek Olšák	0fe0320dc0	radeonsi: use optimal packet order when doing a pipeline sync Process most new SET packets in parallel with previous draw calls, then flush caches and wait, start the draw, and do L2 prefetches last. This decreases the [CP busy / SPI busy] ratio (verified with GRBM perf counters). In other words, the time window when shaders are idle (between (the wait and the draw) is much shorter now. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	895de1d03d	radeonsi: expose the number of decompress calls to the HUD Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	c093821cee	radeonsi: rename shader_userdata -> shader_pointers where appropriate Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	c441999b7a	radeonsi: prefetch VBO descriptors after the first VGT shader Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	e887c68bd2	radeonsi: add a separate dirty mask for prefetches so that we don't rely on si_pm4_state_enabled_and_changed, allowing us to move prefetches after draw calls. v2: ckear the dirty mask after unbinding shaders Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2017-08-07 21:12:24 +02:00
Marek Olšák	a7b0014d1a	radeonsi: add and use si_pm4_state_enabled_and_changed Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	58d062b87d	radeonsi: de-atomize L2 prefetch I'd like to be able to move the prefetch call site around. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	4e629ca7c7	radeonsi: align all CE dumps to L2 cache line size Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Marek Olšák	01fed67608	radeonsi: remove a tautology sctx->framebuffer.nr_samples >= 1 Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-07 21:12:24 +02:00
Andres Rodriguez	7fe5fa0013	radeonsi: enable support for EXT_memory_object v2: fix an indentation error v3: don't enable for r600 Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-08-06 12:42:07 +10:00
Andres Rodriguez	68623933a0	radeonsi: hook up device/driver UUID queries v2: move from r600_common to radeonsi Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-06 12:42:07 +10:00
Mauro Rossi	f99a733e38	android: radeonsi: add nir include paths Android build changes to avoid the following building error: target C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_pipe.c ... In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.c:38: external/mesa/src/compiler/nir/nir.h:48:10: fatal error: 'nir_opcodes.h' file not found ^ 1 error generated. Fixes: `da62a31c5b` "radeonsi: add nir include paths" Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-08-04 14:58:50 +01:00
Nicolai Hähnle	12ce39d3de	radeonsi: set drirc compiler options before calling common screen init Also, access the options directly, allowing us to get rid of the PIPE_SCREEN_xxx flags. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-04 10:46:01 +02:00
Juan A. Suarez Romero	5ff4c5aef4	radeonsi: Makefile.sources: include driinfo_radeonsi.h Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-08-04 09:54:46 +02:00
Marek Olšák	da942a4b81	radeonsi: program tile swizzle for color and FMASK surfaces for GFX & SDMA Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-04 02:10:04 +02:00
Marek Olšák	ae5d86e94d	radeonsi: if FMASK is disabled, set CB_COLORi_FMASK = CB_COLORi_BASE properly Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-04 02:10:04 +02:00
Mauro Rossi	5baed8f0e6	android: radeonsi: prepare for driver-specific driconf options Android build changes to avoid the following building error: In file included from external/mesa/src/gallium/targets/dri/target.c:1: external/mesa/src/gallium/auxiliary/target-helpers/drm_helper.h:185:10: fatal error: 'radeonsi/si_driinfo.h' file not found ^ 1 error generated. Fixes: `0f8c5de869` "radeonsi: prepare for driver-specific driconf options" Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-08-03 10:55:29 +01:00
Timothy Arceri	4e4042df6b	gallium: introduce PIPE_CAP_MEMOBJ This can be used to guard support for EXT_memory_object and related extensions. v2: update gallium docs v3 (Timothy Arceri): - add cap to nv50 Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-08-03 13:57:16 +10:00
Nicolai Hähnle	53485c2d0e	radeonsi: add enable_sisched driconf option Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-02 09:50:59 +02:00
Nicolai Hähnle	0f8c5de869	radeonsi: prepare for driver-specific driconf options Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-02 09:50:58 +02:00
Nicolai Hähnle	bc7f41e11d	gallium: add pipe_screen_config to screen_create functions This allows a more generic mechanism for passing user configurations into drivers by accessing the dri options directly. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-02 09:50:57 +02:00
Nicolai Hähnle	78476cfe07	radeonsi: enable ARB_transform_feedback_overflow_query v2: update for new cap name Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-02 09:49:09 +02:00
Nicolai Hähnle	a677799e51	gallium: add PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE and corresponding cap v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit in the documentation Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-08-02 09:37:10 +02:00
Dave Airlie	cb6f16dce9	radeon/ac: use ds_swizzle for derivs on si/cik. This looks like it's supported since llvm 3.9 at least, so switch over radeonsi and radv to using it, -pro also uses this. We can now drop creating lds for these operations as the ds_swizzle operation doesn't actually write to lds at all. Acked-by: Marek Olšák <marek.olsak@amd.com> (stable requested due to fixing radv CIK conformance tests) Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-02 00:12:01 +01:00
Marek Olšák	1aeafb59e6	radeonsi: print CE IBs into ddebug reports Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-01 17:06:38 +02:00
Marek Olšák	1482861abe	radeonsi: fix printing vertex buffer descriptors into ddebug reports Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-01 17:06:38 +02:00
Marek Olšák	404f524fe2	radeonsi: don't flush sL1 conditionally in WAIT_ON_CE_COUNTER I don't know the condition for the flush, but we better turn this off. The sL1 flush is used when CE dumps stuff into a ring buffer and the ring buffer wraps. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-01 17:06:38 +02:00
Marek Olšák	94965b8219	radeonsi: set up HTILE in descriptors only when level 0 is accessible Compression isn't enabled with non-zero levels. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-01 17:06:38 +02:00
Marek Olšák	b9fc9d3f24	radeonsi: fix various CLEAR_STATE issues Fixes: `064550238e` ("radeonsi: use CLEAR_STATE to initialize some registers") Bugzilla: https://bugs.freedesktop.org/101969 Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-01 17:06:38 +02:00
Nicolai Hähnle	2879a602dd	radeonsi: ensure that temp array allocas are in the entry block Otherwise, code generation fails. This has become necessary since some shaders are wrapped in control flow. Fixes: `081ac6e5c6` ("radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)") Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 15:00:22 +02:00
Nicolai Hähnle	dfe237aef9	radeonsi: enable R600_DEBUG=nir for vertex and fragment shaders Also, disable geometry and tessellation shaders. Mixing and matching NIR and TGSI shaders should work (and I've tested it for the VS/PS interface), but geometry and tessellation requires VS-as-ES/LS, which isn't implemented yet for NIR. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:45 +02:00
Nicolai Hähnle	3b4f481c60	radeonsi: VS as ES/LS are not yet supported with R600_DEBUG=nir Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:44 +02:00
Nicolai Hähnle	3997b10f74	radeonsi/nir: lower uniforms to UBO loads Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:44 +02:00
Nicolai Hähnle	d5741489d3	radeonsi/nir: lower txp instructions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:44 +02:00
Nicolai Hähnle	1c64637c26	ac/nir,radeonsi: add and use ac_shader_abi::frag_pos v2: update for LLVMValueRefs in ac_shader_abi Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:43 +02:00
Nicolai Hähnle	f03c54e05a	ac/nir,radeonsi: add and use ac_shader_abi::{ancillary,sample_coverage} v2: update for LLVMValueRefs in ac_shader_abi Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:43 +02:00
Nicolai Hähnle	25ff22e390	radeonsi: tweak next-shader assumptions when streamout is used VS with streamout is always a HW VS. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:43 +02:00
Nicolai Hähnle	a69afb68c9	radeonsi: use new function ac_build_umin for edgeflag clamping Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:42 +02:00
Nicolai Hähnle	e247357240	ac/nir,radeonsi: add ac_shader_abi::front_face v2: update for LLVMValueRefs in ac_shader_abi Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:42 +02:00
Nicolai Hähnle	a0af3daf9c	radeonsi: implement and use ac_shader_abi::load_ssbo Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:41 +02:00
Nicolai Hähnle	d46018a4d7	radeonsi: make get_indirect_index globally visible Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:41 +02:00
Nicolai Hähnle	41d4016e06	radeonsi/nir: perform radeonsi-specific lowering and optimization passes Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:40 +02:00
Nicolai Hähnle	b49c2c9fa3	radeonsi/nir: perform lowering of input/output driver locations Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:40 +02:00
Nicolai Hähnle	8d23575c96	radeonsi/nir: add image descriptor loading Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:39 +02:00
Nicolai Hähnle	f37f9aed84	ac/nir: add image and write parameter to ac_shader_abi::load_sampler_desc Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:38 +02:00
Nicolai Hähnle	677bd47cb9	radeonsi/nir: set si_shader_context::num_{sampler,images} Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:38 +02:00
Nicolai Hähnle	7c27ef182c	radeonsi/nir: implement ac_shader_abi::load_sampler_desc v2: remove enum desc_type from radeonsi (Marek) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:38 +02:00
Nicolai Hähnle	7763c7b2ba	ac/nir,radeonsi: add ac_shader_abi::chip_class Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:37 +02:00
Nicolai Hähnle	a6f597536d	radeonsi/nir: emit FS outputs Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:37 +02:00
Nicolai Hähnle	c41a8e2ad9	radeonsi/nir: load FS inputs Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:36 +02:00
Nicolai Hähnle	8643d41622	radeonsi/nir: load VS inputs Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:36 +02:00
Nicolai Hähnle	d007919d99	ac/nir,radeonsi: add ac_shader_abi::load_ubo Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:36 +02:00
Nicolai Hähnle	0c3b6a4bd9	ac,radeonsi: add ac_shader_abi::emit_outputs for hardware VS shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:34 +02:00
Nicolai Hähnle	1ea972e08a	radeonsi: pass si_shader_context to get_primitive_id Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:33 +02:00
Nicolai Hähnle	9df23db13d	radeonsi: translate NIR to LLVM Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:33 +02:00
Nicolai Hähnle	d77526ee30	radeonsi: dump NIR instead of TGSI when appropriate Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:33 +02:00
Nicolai Hähnle	c5f70a5174	radeonsi: bypass the shader cache for NIR shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:33 +02:00
Nicolai Hähnle	29d7bdd179	radeonsi: scan NIR shaders to obtain required info v2: set num_instruction to 2, i.e. 1 + END (Marek) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:32 +02:00
Nicolai Hähnle	90b3ba8970	radeonsi: add si_shader_selector::nir Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:32 +02:00
Nicolai Hähnle	acd09389cb	radeonsi: implement pipe_screen::get_compiler_options for NIR Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:31 +02:00
Nicolai Hähnle	da62a31c5b	radeonsi: add nir include paths Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:31 +02:00
Nicolai Hähnle	61ad2f13c3	ac,radeonsi: move some VS input descriptions to ac_shader_abi v2: use LLVM values instead of function parameter indices Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:31 +02:00
Nicolai Hähnle	c7e9ebb3ab	radeonsi: store shader function arguments in a structure Aligns the code a bit more with ac/nir, and simplifies the setup of ac_shader_abi. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:31 +02:00
Nicolai Hähnle	00476907fc	gallium/targets: link against NIR when building radeonsi Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:30 +02:00
Nicolai Hähnle	01f1598a40	gallium: add PIPE_CAP_NIR_SAMPLERS_AS_DEREF Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-31 14:55:29 +02:00
Marek Olšák	5d8359ff4d	radeonsi: expose MRT-draw-calls to HUD Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-31 12:46:43 +02:00
Marek Olšák	f4d095cc65	radeonsi: update dirty_level_mask only when flushing or unbinding framebuffer This fixes corruption with bindless textures in Dawn Of War 3. The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask was only updated after the first draw call. The problem is bindless textures are checked for decompression every draw call and we would only decompress after the first draw call. The solution is to set dirtiness after the last draw call to the framebuffer, so the (unconditional) decompression of bindless textures happens at the right time. Cc: 17.2 <mesa-stable@lists.freedesktop.org> Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-07-28 16:34:24 +02:00
Marek Olšák	28c7fbbe0f	radeonsi: rely on CLEAR_STATE for clearing UCP and blend color registers Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-28 08:03:24 +02:00
Marek Olšák	7c721b28f6	radeonsi: rely on CLEAR_STATE for resetting the framebuffer and sample mask Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-28 08:03:24 +02:00
Marek Olšák	064550238e	radeonsi: use CLEAR_STATE to initialize some registers Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-28 08:03:24 +02:00
Nicolai Hähnle	06e20c4b8c	radeonsi: bail out instead of crashing if the main shader part failed to compile Reviewed: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:45 +02:00
Nicolai Hähnle	4dd86631f4	radeonsi: update a comment for merged shaders Reviewed: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:45 +02:00
Nicolai Hähnle	4738dd9546	radeonsi/gfx9: dump previous stage LLVM IR for merged shaders Reviewed: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:45 +02:00
Nicolai Hähnle	760876a7b1	radeonsi: make sure TCS main output VGPRs don't alias inputs Avoids an unnecessary move introduce by "radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)" Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:42 +02:00
Nicolai Hähnle	081ac6e5c6	radeonsi/gfx9: always wrap GS and TCS in an if-block (v2) With merged ESGS shaders, the GS part of a wave may be empty, and the hardware gets confused if any GS messages are sent from that wave. Since S_SENDMSG is executed even when EXEC = 0, we have to wrap even non-monolithic GS shaders in an if-block, so that the entire shader and hence the S_SENDMSG instructions are skipped in empty waves. This change is not required for TCS/HS, but applying it there as well simplifies the logic a bit. Fixes GL45-CTS.geometry_shader.rendering.rendering.* v2: ensure that the TCS epilog doesn't run for non-existing patches Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:32 +02:00
Nicolai Hähnle	873789002f	radeonsi/gfx9: fix vertex idx in ES with multiple waves per threadgroup Cc: mesa-stable@lists.freedesktop.org Reviewed: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:32 +02:00
Marek Olšák	ed2b3f5c81	radeonsi: decrease the number of compiler threads Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-26 19:53:26 +02:00
Marek Olšák	433f6f7ac9	gallium/radeon: make S_FIXED function signed and move it to shared code This fixes a bug uncovered by: `2412c4c81e` util: Make CLAMP turn NaN into MIN. Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-26 19:53:26 +02:00
Nicolai Hähnle	65fbaab0b7	radeonsi: fix detection of DRAW_INDIRECT_MULTI on SI The firmware version numbers for SI were wrong. The new numbers are probably too conservative (we don't have a definitive answer by the firmware team), but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on Tahiti (by Gustaw) and on Verde (by myself). While this is technically adding a feature, it's a feature we thought we had for a long time. The change is small enough and we're early enough in the 17.2 release cycle that it should still go in. Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com> Cc: 17.2 <mesa-stable@lists.freedesktop.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-26 11:48:32 +02:00
Marek Olšák	ecec21add2	radeonsi: add back the USE_MININUM_PRIORITY flag to the low-prio compiler queue Accidentally removed in `9f320e0a38`. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-18 13:13:34 -04:00
Marek Olšák	c62809171c	radeonsi/gfx9: add VM fault dmesg parser support Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:57:34 -04:00
Marek Olšák	9f320e0a38	radeonsi: automatically resize shader compiler thread queues when they are full Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:57:29 -04:00
Marek Olšák	465bb47d6f	radeonsi: expose ARB_timer_query unconditionally clock_crystal_freq is always non-zero now. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:57:17 -04:00
Marek Olšák	5fb80a1e84	radeonsi: prevent a crash with DBG_CHECK_VM and u_threaded_context by setting PIPE_CONTEXT_DEBUG in the caller Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:56:51 -04:00
Marek Olšák	ffa7ec9e22	radeonsi: simplify computation of tessellation offchip buffers This is overly cautious, but better safe than sorry. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:55:07 -04:00
Marek Olšák	facfab28fe	radeonsi/gfx9: add workarounds to avoid VGPR indexing completely For inputs and outputs, indirect indexing is lowered by the GLSL compiler. For temporaries, use alloca and disable the "promote-alloca" pass. In the future, we could switch all codepaths to alloca permanently and just rely on the "promote-alloca" pass. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Marek Olšák	93391ac478	radeonsi: emit param exports after position exports Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Marek Olšák	9d9ffc8475	radeonsi: move building parameter exports into a separate function Both loops now look simple. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Marek Olšák	4e30fb4ecc	radeonsi: don't use info.num_inputs when it's unused For clarity. It's only used by color interpolation. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Marek Olšák	f8d6dd9b3d	radeonsi: add si_build_fs_interp helper This is much simpler. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Marek Olšák	4560f2b90a	radeonsi: merge si_llvm_get_amdgpu_target into ac_get_llvm_target Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Marek Olšák	ece0c0439f	radeonsi: don't call gallivm_init_llvm_targets It's for initializing the native (x86) target. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-17 10:50:39 -04:00
Nicolai Hähnle	c22e3c5373	radeonsi/gfx9: fix crash building monolithic merged ES-GS shader Forwarding from the ES prolog to the ES just barely exceeds the current maximum array size when 16 vertex attributes are used. Give it a decent bump to account for merged shaders having up to 32 user SGPRs. Fixes a crash in GL45-CTS.multi_bind.draw_bind_vertex_buffers. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-13 13:01:15 +02:00
Marek Olšák	aaee0d1bbf	gallium: use "ull" number suffix to keep the QtCreator parser happy It can't parse "llu". Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-07-10 22:44:48 +02:00
Samuel Pitoiset	a584a12308	radeonsi: fix invalidating bindless buffer descriptors The VA is stored at [4:5], not [0:1]. This invalidated all texture buffer descriptors when they were made resident in the current context. This removes few partial flushes and cache invalidations which are needed when updating a bindless descriptor on the fly with a WRITE_DATA packet. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-07 09:09:39 +02:00
Marek Olšák	ccfac28835	radeonsi: set COMPUTE_DISPATCH_INITIATOR.ORDER_MODE = 1 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-29 16:19:35 +02:00
Marek Olšák	af52e61935	radeonsi: use the DISPATCH packets to force COMPUTE_START_X/Y/Z = 0 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-29 16:19:35 +02:00
Rob Herring	a3d98ca62f	Android: use symlinks for driver loading Instead of having special driver loading logic for Android, create symlinks to gallium_dri.so so we can use the standard loading logic. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Rob Herring <robh@kernel.org>	2017-06-29 09:09:49 -05:00
Marek Olšák	4a10d6154e	radeonsi: move instance divisors into a constant buffer Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 19:55:09 +02:00
Marek Olšák	aef998fe4b	radeonsi: check nr_cbufs in other places before flushing CB Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:46:12 +02:00
Marek Olšák	f9a7e7fe14	radeonsi: use #pragma pack to pack si_shader_key sizeof(struct si_shader_key): Before reverting the 2 commits: 120 bytes After reverting the 2 commits: 128 bytes With #pragma pack: 107 bytes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Marek Olšák	77d2a98353	Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs" This reverts commit `7b2240ac9c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Marek Olšák	dbe45e1180	Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy" This reverts commit `6b6fed3a3c`. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-27 18:45:07 +02:00
Marek Olšák	ccf963ed29	radeonsi: don't flush and wait for CB after depth-only rendering Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-26 23:35:19 +02:00
Nicolai Hähnle	f17d78becc	radeonsi: support indirect indexing in INTERP_* opcodes The hardware doesn't support it, so we just interpolate all array elements and then use indirect indexing on the resulting vector. Clearly, this is not very efficient. There is an argument to be had for adding if/else, or perhaps even pulling the data out of LDS directly. Both don't really seem worth the effort, considering that it seems nobody actually uses this feature. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-26 14:02:06 +02:00
Marek Olšák	e25950808f	radeonsi/gfx9: don't overallocate shader binaries It's not needed. The hw doesn't fetch ahead over page boundaries. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-24 23:04:37 +02:00
Marek Olšák	f6e98e99e3	radeonsi: unreference vertex buffers when destroying the context Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-06-23 19:53:54 +02:00
Marek Olšák	ee16796d54	radeonsi: implement the workaround for Rocket League - postponed TGSI kill Do KILL at the end of shaders so as not to break WQM. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100070 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-23 19:50:20 +02:00
Marek Olšák	a98a04ec80	gallium/radeon: pass create_screen flags to r600_common_screen_init Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-23 19:50:20 +02:00
Marek Olšák	c2f82fc1d3	Revert "radeonsi: don't emit partial flushes at the end of IBs (v2)" This reverts commit `c9040dc9e7`. People have reported it causes corruption on VI, and I see GPU hangs on GFX9.	2017-06-23 19:13:55 +02:00
Marek Olšák	db37c0be13	radeonsi/gfx9: don't ever flush the TC metadata cache The closed Vulkan driver doesn't do it either. Also remove some old comments that aren't useful. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 13:15:27 +02:00
Marek Olšák	920f20f039	radeonsi/gfx9: use TC L2 for fast color clear with CP DMA Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 13:15:27 +02:00
Marek Olšák	c9040dc9e7	radeonsi: don't emit partial flushes at the end of IBs (v2) The kernel sort of does the same thing with fences. v2: do emit partial flushes on SI Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 13:15:27 +02:00
Nicolai Hähnle	da2e52b382	radeonsi: use the correct LLVMTargetMachineRef in si_build_shader_variant si_build_shader_variant can actually be called directly from one of normal-priority compiler threads. In that case, the thread_index is only valid for the normal tm array. v2: - use the correct sel/shader->compiler_ctx_state Fixes: `86cc809726` ("radeonsi: use a compiler queue with a low priority for optimized shaders") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-22 09:45:23 +02:00
Marek Olšák	79bd1d4f8b	radeonsi/gfx9: keep reusing the same buffer/address for the gfx9 flush fence instead of using a monotonic suballocator v2: initialize the memory at context creation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 01:51:02 +02:00
Marek Olšák	c66fc618cc	radeonsi/gfx9: enable the constant engine I think this kernel commit fixes it: drm/amdgpu:use FRAME_CNTL for new GFX ucode Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 01:51:02 +02:00
Marek Olšák	d7141d8bc0	radeonsi/gfx9: indirect buffers and all CP packets use TC L2 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 01:51:02 +02:00
Marek Olšák	2638250fec	radeonsi: flush CB after MSAA only when transitioning from CB to textures The main flush before texturing is done after the FMASK decompress pass. CB after MSAA rendering is not flushed in set_framebuffer_state and also not in memory_barrier if the current color buffer is MSAA. We fully rely on the FMASK decompress pass for the flushing. Some CB decompress and resolve passes need an explicit flush before and after. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 01:51:02 +02:00
Marek Olšák	51c219739c	radeonsi: unify CB_RESOLVE blitter invocation code Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 01:51:02 +02:00
Marek Olšák	2263610827	radeonsi: flush DB caches only when transitioning from DB to texturing Use the mechanism of si_decompress_textures, but instead of doing the actual decompression, just flag the DB cache flush there. This removes a lot of unnecessary DB cache flushes. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 01:51:02 +02:00
Marek Olšák	fdca690e91	radeonsi: add separate HUD counters for CB and DB cache flushes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-22 01:51:02 +02:00
Samuel Pitoiset	ea2492b62f	radeonsi: set correct usage flag according to image access type Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-20 13:01:18 +02:00
Samuel Pitoiset	afeaa2e98a	radeonsi: update all resident texture descriptors when needed To avoid useless DCC fetches when DCC is disabled, descriptors have to be updated in order to reflect this change. This is quite similar to how we update descriptors of bound textures. As a side effect, this should also prevent VM faults when bindless textures are invalidated, because the VA in the descriptor has to be updated accordingly as well. I don't see any performance improvements with DOW3. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-20 10:14:55 +02:00
Samuel Pitoiset	f00e80e3f7	radeonsi: keep track of the sampler state for texture handles Needed for updating all resident texture descriptors when dirty_tex_counter changes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-20 10:14:52 +02:00
Marek Olšák	3fc99f1299	radeonsi: fix dumping shader descriptors into ddebug logs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-19 20:16:20 +02:00
Marek Olšák	f9dc29a9a5	radeonsi: add a workaround for inexact SNORM8 blitting again GFX9 is affected. We only have tests for GL_x_SNORM where x is R8, RG8, RGB8, and RGBA8. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-19 20:15:36 +02:00
Marek Olšák	0f827b51c0	radeonsi/gfx9: fix TC-compatible stencil compression Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-19 20:15:36 +02:00
Marek Olšák	8a264dd829	radeonsi/gfx9: fix TXF_LZ with 1D textures Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-19 20:15:36 +02:00
Marek Olšák	353b60cab5	radeonsi/gfx9: disable sparse buffers Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-19 20:15:36 +02:00
Samuel Pitoiset	6ff6863c32	radeonsi: reduce overhead for resident textures which need color decompression This is done by introducing a separate list. si_decompress_textures() is now 5x faster. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-18 14:10:38 +02:00
Samuel Pitoiset	06ed251c32	radeonsi: reduce overhead for resident textures which need depth decompression This is done by introducing a separate list. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-18 14:10:36 +02:00
Samuel Pitoiset	705a6a560e	radeonsi: use util_dynarray_foreach for bindless resources Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-18 14:10:34 +02:00
Samuel Pitoiset	8d9e76ce1f	gallium/radeon: add a new HUD query for the number of resident handles Useful for debugging performance issues when ARB_bindless_texture is enabled. This query doesn't make a distinction between texture and image handles. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-18 14:08:08 +02:00
Emil Velikov	1f958c1337	radeonsi: include ac_binary.h for struct ac_shader_binary The header embeds the struct so it needs the header inclusion instead of the dummy forward declaration. Cc: Nicolai Hähnle <nicolai.haehnle@amd.com> Cc: Marek Olšák <marek.olsak@amd.com> Cc: Tom Stellard <tstellar@redhat.com> Fixes: `32206c5e56` ("radeonsi: Add radeon_shader_binary member to struct si_shader") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-06-17 11:38:02 +01:00
Samuel Pitoiset	65d1e4d1eb	radeonsi: enable ARB_bindless_texture This has only been tested on RX480. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	285ec4463b	radeonsi: add support for loading bindless images Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	950b5ffa31	radeonsi: add support for loading bindless samplers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	0c2834c5b2	radeonsi: invalidate buffers which are made resident if needed When a buffer becomes resident, check if it has been invalidated, if so update the descriptor and the dirty flag. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	811756dfd0	radeonsi: upload new descriptors when resident buffers are invalidated When texture buffers are invalidated the addr in the resident descriptor has to be updated but we can't create a new descriptor because the resident handle has to be the same. Instead, use the WRITE_DATA packet which allows to update memory directly but graphics/compute have to be idle in case the GPU is reading the descriptor. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	48fe8a6210	radeonsi: only decompress resident textures/images when used When the current bound shaders don't use any bindless textures or images, it's useless to decompress the resident resources. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	2c3a7d5840	radeonsi: track use of bindless samplers/images from tgsi_shader_info This adds some new helper functions to know if the current draw call (or dispatch compute) is using bindless samplers/images, based on TGSI analysis. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	e1813a8635	radeonsi: decompress resident textures/images before graphics/compute Similar to the existing decompression code path except that it loops over the list of resident textures/images. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	d7e1a66bb5	radeonsi: decompress DCC for resident textures/images Analogous to bound textures/images. We should also update the resident descriptors and disable COMPRESSION_EN for avoiding useless DCC fetches, but I postpone this optimization for a separate series. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	a45e198e2d	radeonsi: only add descriptors in presence of resident handles This won't help much except for applications that use a ton of resident handles. Though, this will reduce the winsys overhead a little bit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	333c8f65cf	radeonsi: add all resident buffers to the current CS Resident buffers have to be added to every new command stream. Though, this could be slightly improved when current shaders don't use any bindless textures/images but usually applications tend to use bindless for almost every draw call, and the winsys thread might help when buffers are added early. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	9cc328eef6	radeonsi: implement ARB_bindless_texture This implements the Gallium interface. Decompression of resident textures/images will follow in the next patches. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	77bbdcdfcd	radeonsi: add a slab allocator for bindless descriptors For each texture/image handles, we need to allocate a new buffer for the bindless descriptor. But when the number of buffers added to the current CS becomes high, the overhead in the winsys (and in the kernel) is important. To reduce this bottleneck, the idea is to suballocate the bindless descriptors using a slab similar to the one used in the winsys. Currently, a buffer can hold 1024 bindless descriptors but this limit is arbitrary and could be changed in the future for some reasons. Once a slab is allocated the "base" buffer is added to a per-context list. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	86d7b7f01a	radeonsi: add si_set_shader_image_desc() helper To share some common code between bound and bindless images. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	410b4ec06d	radeonsi: add si_set_sampler_view_desc() helper To share some common code between bound and bindless textures. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	2ce04d7c1a	radeonsi: add si_init_descriptor_list() helper This will be used in order to initialize resident descriptors for bindless textures/images. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-14 10:04:36 +02:00
Samuel Pitoiset	973822bcee	gallium: add PIPE_CAP_BINDLESS_TEXTURE Whether bindless texture operations are supported by the underlying driver. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-14 10:04:36 +02:00
Marek Olšák	4951b0adbd	radeonsi: pack si_context better there isn't much to gain here Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	6d43d352cc	radeonsi: pack si_framebuffer better Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	ca815f1ead	radeonsi: pack si_sampler_view better Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	29bf2530d8	radeonsi: pack si_buffer_resources better Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	cf5ce61148	radeonsi: pack struct si_descriptors better Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	217114dd73	radeonsi: pack struct si_vertex_elements better Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	e80a056ff9	radeonsi: replace si_vertex_elements::elements with separate fields It makes si_vertex_elements a little smaller. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	c8b6f42e25	radeonsi: rename si_vertex_element -> si_vertex_elements Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	7be6186e0c	radeonsi: allocate si_state_rasterizer::pm4_poly_offset only when needed Each element has over 700 bytes. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	a828f5d783	radeonsi: pack si_state_rasterizer fields Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	6b6fed3a3c	radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy The previous patch helps with this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	7b2240ac9c	radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs the next patch will benefit from this Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	1621b33d73	radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	30882ba0dd	radeonsi: don't emit DB_STENCIL_CONTROL if it has no effect Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	6743dc01fd	radeonsi: fix missing num_L2_invalidates increment Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Marek Olšák	c503381864	radeonsi: get rid of more compressed_colortex_mask names Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-12 18:24:37 +02:00
Juan A. Suarez Romero	a625d58ee1	radeonsi: call LLVMAddEarlyCSEMemSSAPass only for LLVM >= 4.0 LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0. Fixes: `257b538` ("radeonsi: do EarlyCSEMemSSA LLVM pass) Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-06-08 23:32:32 +02:00
Marek Olšák	6940361796	gallium/radeon: don't allocate HTILE in a separate buffer Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	c6451b1209	radeonsi: rename depth decompress functions Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	d8a577d96e	radeonsi: rename shader resource decompress masks to their true meaning Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	da26de5ff7	radeonsi: rename is_compressed_colortex -> color_needs_decompression Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	391673af7a	radeonsi: disable the patch ID workaround on SI when the patch ID isn't used (v2) The workaround causes a massive performance decrease on 1-SE parts. (Cape Verde, Hainan, Oland) The performance regression is already part of 17.0 and 17.1. v2: check tess_uses_prim_id Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	4b8d0c2b1d	radeonsi: don't update dependent states if it has no effect (v2) This and the previous clip_regs commit decrease IB sizes and the number of si_update_shaders invocations as follows: IB size si_update_shaders calls Borderlands 2 -10% -27% Deus Ex: MD -5% -11% Talos Principle -8% -30% v2: always dirty cb_render_state in set_framebuffer_state Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-08 23:29:07 +02:00
Marek Olšák	bacaceb78a	radeonsi: update clip_regs on shader state changes only when it's needed Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:20 +02:00
Marek Olšák	2b7fd9df9a	radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:18 +02:00
Marek Olšák	140b3c5019	radeonsi: add a new helper si_get_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:16 +02:00
Samuel Pitoiset	878bd981bf	radeonsi: isolate real framebuffer changes from the decompression passes (v3) When a stencil buffer is part of the framebuffer state, it is decompressed but because it's bindless, all draw calls set stencil_dirty_level_mask to 1. v2: Marek - set the flags outside the loop - also clear and set framebuffer.do_update_surf_dirtiness there - do it in the DB->CB copy path too v3: Marek - save and restore the do_update_surf_dirtiness flag Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:14 +02:00
Marek Olšák	257b538fd2	radeonsi: do EarlyCSEMemSSA LLVM pass so that LLVM IR looks like CSE has been run on it. It's also recommended by the instruction combining pass. This also fixes: - GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash) - piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail) The code size decrease is positive, the register usage isn't. There is a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown and GRID Autosport. EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE. SGPRS: 1935420 -> 1938076 (0.14 %) VGPRS: 1645504 -> 1645988 (0.03 %) Spilled SGPRs: 2493 -> 2651 (6.34 %) Spilled VGPRs: 107 -> 115 (7.48 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 1512 -> 1516 (0.26 %) dwords per thread Code Size: 61981592 -> 61890012 (-0.15 %) bytes Max Waves: 371847 -> 371798 (-0.01 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:09 +02:00
Marek Olšák	e9409c86e7	radeonsi: remove 8 bytes from si_shader_key We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:06 +02:00
Marek Olšák	2b8b9a56ef	radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:14:15 +02:00
Marek Olšák	7d67cbefe0	radeonsi: clean up decompress blend state names Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 19:38:45 +02:00
Marek Olšák	d2ee423b69	radeonsi: enable TC-compatible stencil compression on VI Most things are in place. Ideally we won't see decompress blits for stencil anymore. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 19:38:39 +02:00
Marek Olšák	3effce4fb0	radeonsi/gfx9: prevent a race when the previous shader's main part is missing Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	b5bc826ead	radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	ffbaba6072	radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9 LS is merged into TCS. If there is no TCS, LS is merged into fixed-func TCS. The problem is the fixed-func TCS was ignored by scratch update functions, so LS didn't have the scratch buffer set up. Note that Mesa 17.1 doesn't have merged shaders. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	6e2c07749b	radeonsi: move streamout state update out of si_update_shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	294be5279d	radeonsi: remove dead code in declare_input_fs Colors are interpolated in the PS prolog. This was never used. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	8147c4a4a5	radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_key Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	86cc809726	radeonsi: use a compiler queue with a low priority for optimized shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	89b6c93ae3	util/u_queue: add an option to set the minimum thread priority Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	6f2947fa79	radeonsi: decrease the number of compiler threads to num CPUs - 1 Reserve one core for other things (like draw calls). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	38bd468a78	radeonsi: drop unfinished shader compilations when destroying shaders If we enqueue too many jobs and destroy the GL context, it may take several seconds before the jobs finish. Just drop them instead. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	a893c91697	gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible so that we can use TXF. The cubemap blit pixel shader code size: 148 -> 92 bytes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:10:50 +02:00
Marek Olšák	6c655cfeb4	radeonsi: fix a GPU hang with tessellation on 2-CU configs Only harvested Stoney has 2 CUs. Tested on 2-CU Stoney and Fiji forced to 2 CUs. Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org> Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2017-06-06 13:01:52 +02:00
Lyude	467af445a3	gallium: Add a cap to check if the driver supports ARB_post_depth_coverage Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-06-02 23:19:22 -04:00
Samuel Pitoiset	30a4e375f5	radeonsi: remove unused si_pm4_state::compute_pkt Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-31 09:20:57 +02:00
Samuel Pitoiset	e4b05a50df	radeonsi: remove chip_class define from si_pm4.h Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-31 09:20:55 +02:00
Samuel Pitoiset	d90a6c2f23	radeonsi: merge si_pm4_free_state_simple() into si_pm4_free_state() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-31 09:20:53 +02:00
Marek Olšák	48b91103ce	radeonsi: use ac_build_buffer_load for shader buffer loads and document why we can't use SMEM yet. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-29 01:52:16 +02:00
Marek Olšák	e019ea8f4b	radeonsi: move building llvm.SI.load.const into ac_build_buffer_load Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-29 01:52:16 +02:00
Marek Olšák	e1942c970f	radeonsi: rename readonly_memory -> can_speculate This is more accurate. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-29 01:52:16 +02:00
Marek Olšák	24306c0b27	radeonsi: fix a crash in si_destroy_context if we fail early Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-29 01:52:16 +02:00
Samuel Pitoiset	ab8fb5a082	radeonsi: drop useless memcmp() check in si_set_blend_color() cso_set_blend_color() already checks if the old state is different. Only Nine uses pipe::set_blend_color() directly but I guess it should use the cache too. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-05-27 18:00:45 +02:00
Leo Liu	f94cfdc5f2	radeonsi: enable vcn decode Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-05-25 11:40:20 -04:00
Leo Liu	c23ffafc50	radeon: rename has_uvd info to has_hw_decode Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-05-25 11:40:20 -04:00
Marek Olšák	2beb31bd7c	radeonsi/gfx9: compile shaders with +xnack so that LLVM doesn't allocate SGPRs where XNACK is. Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-22 19:23:39 +02:00
Marek Olšák	807e1d2577	radeonsi/gfx9: use CE RAM optimally On GFX9 with only 4K CE RAM, define the range of slots that will be allocated in CE RAM. All other slots will be uploaded directly. This will switch dynamically according to which slots are used by current shaders. GFX9 CE usage should now be similar to VI instead of being often disabled. Tested on VI by taking the GFX9 CE allocation codepath and setting num_ce_slots = 2 everywhere to get frequent switches between both modes. CE is still disabled on GFX9. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	1cde473ec0	radeonsi: remove CE offset alignment restriction This was only needed by LOAD_CONST_RAM, which is now only used to load whole CE. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	a7f098fb76	radeonsi: only upload (dump to L2) those descriptors that are used by shaders This decreases the size of CE RAM dumps to L2, or the size of descriptor uploads without CE. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	53c2ef36da	radeonsi: record which descriptor slots are used by shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	38828094e9	radeonsi: update si_ce_needed_cs_space Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	edb59ef2dc	radeonsi: do only 1 big CE dump at end of IBs and one reload in the preamble A later commit will only upload descriptors used by shaders, so we won't do full dumps anymore, so the only way to have a complete mirror of CE RAM in memory is to do a separate dump after the last draw call. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	06690e63f7	radeonsi: remove early return in si_upload_descriptors All updates of descriptors_dirty also set dirty_mask, so the return is unnecessary. The next commit will want this function to be executed even if dirty_mask == 0. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	b8f8d9e46c	radeonsi: clamp indirect index to the number of declared shader resources We'll do partial uploads of descriptor arrays, so we need to clamp against what shaders declare. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	f07c15ef80	radeonsi: merge sampler and image descriptor lists into one Sampler slots: slot[8], .. slot[39] (ascending) Image slots: slot[7], .. slot[0] (descending) Each image occupies 1/2 of each slot, so there are 16 images in total, therefore the layout is: slot[15], .. slot[0]. (in 1/2 slot increments) Updating image slot 2n+i (i <= 1) also dirties and re-uploads slot 2n+!i. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Marek Olšák	5df24c3fa6	radeonsi: merge constant and shader buffers descriptor lists into one Constant buffers: slot[16], .. slot[31] (ascending) Shader buffers: slot[15], .. slot[0] (descending) The idea is that if we have 4 constant buffers and 2 shader buffers, we only have to upload 6 slots. That optimization is left for a later commit. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 22:15:02 +02:00
Samuel Pitoiset	1468e29e02	radeonsi: get the sampler view type from inst->Texture for TG4 This will also magically fix this special lowering for bindless samplers. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-18 21:48:16 +02:00
Marek Olšák	50189379fa	gallium: add PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION for skipping mapped-buffer checking in every GL draw call Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-17 20:28:44 +02:00

... 3 4 5 6 7 ...

2960 Commits