KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Samuel Pitoiset	460d3ce726	ac: move tg_size to the ABI Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-12 11:54:04 +01:00
Samuel Pitoiset	054c92190c	ac/nir: remove unused nir_to_llvm_context:{defs,phis} Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-12 11:54:02 +01:00
Eric Anholt	0b97eb02b0	egl/gbm: Fix compiler warning about visual matching. The compiler doesn't know that num_visuals > 0. Fixes: `37a8d907cc` ("egl/gbm: Ensure EGLConfigs match GBM surface format") Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-02-12 09:16:44 +00:00
Rob Clark	831fb29252	freedreno: small fix for flushing dependent batches Flush a resource's previous write_batch synchronously. Because a resource's associated batches are not updated until after the flush thread submits rendering to the kernel, this was causing a bit of confusion in the following loop. This fixes a bug that appeared with recent stk. Perhaps we need to re-work things a bit to clear out dependent patches in the ctx's thread and use a fence to deal with the period between when a flush is queued and when it is submitted to the kernel. But this will do until time permits a larger refactor. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	c57ed8e01c	freedreno/ir3: intra-block scheduling Because of loops, we can't schedule all of a block's predecessors first. Instead just assume that the result consumed in a block was written far enough away in all paths into a block. And do an intra-block scheduling pass to figure out if there are any cases where we need to insert extra nop's. This works out better than always assuming the worst case (ie. that a value live into a block was written in the last instruction in the predecessor block). Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	2a2099a875	freedreno/ir3: "boost" the depth of if/else condition Account for the move to predicate register, to try to avoid needing to insert extra NOPs later. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	ffb00f6841	freedreno/ir3: account for arrays in delayslot calc Normally false-deps are not something to consider, since they mostly exist for delay-slot related reasons: * barriers * ordering writes after read * SSBO/image access ordering The exception is a false-dependency on an array store. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	f54d2b4f10	freedreno/ir3: more clever legalize algorithm Previously we didn't handle flow control in legalize, and instead just set (ss)(sy) on the first instruction in every block. Which isn't very clever. Instead, consider output state of all predecessor blocks, so we only set a sync bit if needed for any possible path leading into a block. Because of loops, we can't require that all successor blocks are legalized before a given block, so instead run in a loop until results converge. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	015afb6a38	freedreno/ir3: track block predecessors Useful in the following patches. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	76440fcca9	freedreno/ir3: clean up dangling false-dep's Maybe there is a better way for this.. where it comes useful is "array" loads, which end up as a false-dep for a later array store. If all the uses of an array load are CP'd into their consumer, it still leaves the dangling array load, leading to funny things like: mov.u32u32 r5.y, r0.y mov.u32u32 r5.y, r0.z Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	aea223741f	freedreno/ir3: handle IMMED for mad 2nd src special case Consider also immediates for swapping the first two srcs, because they can be lowered to constant. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	242a8a1957	freedreno/ir3: remove ir3 phi instruction Now that we convert phi webs to ssa, we can drop all this. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	a7b569d60c	freedreno/ir3: remove lower_if_else pass Now that it is unused. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	268ab05484	freedreno/ir3: add experimental GCM pass Generally seems to do worse on instruction count and register usage, according to shader-db. But shader-db also doesn't do a very good job of weighting loop bodies, so that might not be totally valid. So add an env variable to enable GCM pass for easier experimentation. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	4c15c53d91	freedreno/ir3: change opt passes There are more useful nir passes added since initial conversion to nir. But ir3 was never updated to use them. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	ec8bc54ad2	freedreno/ir3: use peephole select pass Agressively lowering all if/else to selects in some extreme cases results in much higher register pressure. Using peephole select instead with a modest threshold speeds up alu2 4x! 16 seems like a good limit, low enough to help alu2 but not too low that it penalizes everything else. With a bit better scheduling of the instruction that moves a value into a predicate register, we might be able to lower this limit a bit more in the future, but since we need 6 cycles from the move to predicate register to predicated branch, that puts some sort of lower bound on how far we can lower this threshold. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	a7ea2b4eba	freedreno/ir3: lower phi webs to regs nir's from_ssa pass is much better at avoiding inserting extra moves than our logic is. And lowering phi webs to regs just treats anything involved in a phi web as an array of length=1. Which with previous array related fixes in RA/etc ends up working out quite well. This cuts down on extra instructions and also helps with register pressure. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	0a6ddf964f	freedreno/ir3: separate arrays from groups Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	55f14a1ac4	freedreno/ir3: make block/instruction serialno per-shader Makes it easier to compare values seen in-game (where there are many shaders) to cmdline standalone compiler. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	5a7de94392	freedreno/ir3: add spirv support to cmdline compiler Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	942341bcd0	freedreno/ir3: don't lower fsat Instead, if possible fold (sat) flag into src, otherwise use: (sat)max.f rD, rS, rS Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	b2fc94f074	freedreno/ir3: add encoding/decoding for (sat) bit Seems to be there since a3xx, but we always lowered fsat. But we can shave some instructions, especially in shaders that use lots of clamp(foo, 0.0, 1.0) by not lowering fsat. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	1b658533e1	freedreno/ir3: extend liverange of arrays Use livein state of other blocks to extend liverange of arrays when they are still needed by successor blocks. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	ac459a6f7f	freedreno/ir3: avoid extra mov's for "arrays" Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	2bc3fb6992	freedreno/ir3: a couple more array fixes (Plus a couple TODOs) Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	8ea1ef4191	freedreno/ir3: keep array stores Since these are not in SSA form, add to block's keeps so it doesn't appear unused. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	c60f150d56	freedreno/ir3: propagate barrier information When eliminating movs, the instruction that is now directly using the src of the mov has the same scheduling order constraints as the original mov instruction. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	98702c1010	freedreno/ir3: remove pointless statement Function ends after this if/else ladder, so it was pointless. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	930ca0e038	freedreno/ir3: some more debug prints Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	a84e324847	freedreno/ir3: fix printing of relative branch offsets The number of bits depends on generation. But printing negative values with a5xx encoding (largest size) but compiling for a3xx or a4xx, would result in negative values printed as large positive values. I guess in practice huge negative branch offsets aren't likely (and if that is the case, the shader is probably too big to grok by reading the assembly). So just print using smallest bitfield size. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	a5c28fe07b	freedreno/ir3: be more clever with if/else jumps Try to clean up things like: br !p0.x #2 br p0.x #something to eliminate the first branch. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	44dd7dcd2f	freedreno/ir3: avoid some spurious sync bits Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	069c0ac625	freedreno/ir3: print # of sync bits for shaderdb When trying to optimize to reduce stalls, it is nice to see this info. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	7d45e2e39f	freedreno: add debug trace for flush Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Grazvydas Ignotas	9b9a89cd79	intel/compiler: fix 64bit value prints on 32bit Fix the following: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘uint64_t {aka long long unsigned int}. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-02-10 17:59:02 +02:00
Timothy Arceri	ff0e3fa1fe	st/glsl_to_nir: remove unused options variable	2018-02-10 11:06:55 +11:00
Timothy Arceri	8f378c116e	st/radeonsi: enable disk cache for nir Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-10 10:59:10 +11:00
Timothy Arceri	bc9d9f9b86	st: add nir shader disk cache support v2: include compute shader support Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-10 10:59:10 +11:00
Timothy Arceri	97efdc0d57	st/glsl_to_tgsi: move nir detection earlier We move the nir check before the shader cache call so that we can call a nir based caching function in a following patch. Also with this change we simply check if vertex shaders support NIR rather than looping over the stages as mixing of shader types is not supported anyway. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-10 10:59:10 +11:00
Timothy Arceri	b5e23887fe	radeonsi: stop returning PIPE_SHADER_IR_NATIVE for PIPE_SHADER_CAP_PREFERRED_IR Clover now checks PIPE_SHADER_CAP_SUPPORTED_IRS for native support instead. This change indirectly enables NIR support for compute shaders on radeonsi. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-10 10:59:10 +11:00
Timothy Arceri	73f1d6f0c1	r600: always return PIPE_SHADER_IR_TGSI for PIPE_SHADER_CAP_PREFERRED_IR We now use PIPE_SHADER_CAP_SUPPORTED_IRS to check for native support in clover. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-10 10:59:10 +11:00
Timothy Arceri	51f484bb44	clover: use PIPE_SHADER_CAP_SUPPORTED_IRS to discover IR PIPE_SHADER_CAP_PREFERRED_IR was conflicting with PIPE_SHADER_IR_NIR for compute shaders, so we let clover pick the one it wants to use. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-10 10:59:10 +11:00
Timothy Arceri	3af4f34e61	r600: add PIPE_SHADER_IR_NATIVE to supported shaders for cs Acked-by: Pierre Moreau <pierre.morrow@free.fr> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-10 10:59:10 +11:00
Timothy Arceri	ce836487b8	radeonsi/nir: add depth layout to scan pass Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-02-10 10:46:28 +11:00
Timothy Arceri	6a8efbe652	radeonsi/nir: add FRAG_RESULT_COLOR to scan pass Fixes a number of draw buffers piglit tests. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-02-10 10:46:28 +11:00
Timothy Arceri	ef8082baf8	ac: convert nir_op_f2f32 src to a float Fixes the following piglit test: ./bin/arb_vertex_attrib_64bit-check-explicit-location -auto -fbo Where we would end up with the nir such as: vec1 64 ssa_11 = pack_64_2x32_split ssa_9, ssa_10 vec1 32 ssa_12 = f2f32 ssa_2 And our pack_64_2x32_split nir to llvm code always produces a 64bit integer as output. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-02-10 10:46:28 +11:00
Timothy Arceri	1b1e5f8edf	ac: fix some 64bit unpack asserts Previously the asserts did not take swizzles into account. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-02-10 10:46:28 +11:00
Mark Janes	9a05c66feb	Revert "i965: prevent potentially null pointer access" This reverts commit `712332ed54`, which caused over 90k failures in Mesa i965 CI. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-02-09 09:46:07 -08:00
Daniel Stone	37a8d907cc	egl/gbm: Ensure EGLConfigs match GBM surface format When we create an EGL window surface on a GBM surface, ensure that the EGLConfig is compatible with the GBM format, notwithstanding XRGB/ARGB interchange. For example, rendering with an XRGB8888 EGLConfig on to an ARGB8888 gbm_surface (and vice-versa) are acceptable, but rendering with an XRGB2101010 EGLConfig on to an XRGB8888 gbm_surface will now be rejected. This was previously allowed through; when 10bpc formats were enabled, clients which picked a completely random EGL config and hoped/assumed they were XRGB8888 would break. If you have bisected a failure to start a GBM/KMS client to this commit, please look at its EGLConfig selection (e.g. through eglChooseConfigs), and add an EGL_NATIVE_VISUAL_ID == gbm_surface format match to the attribs for config selection. Signed-off-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-02-09 16:17:16 +00:00
Daniel Stone	8174e5b49e	egl/gbm: Remove duplicate format table Now that we have mask/channel information in gbm_dri's format conversion table, we can remove the copy in EGL. As this table contains more formats (notably including R8 and RG8, which can be used for BO but not surface allocation), we now compare the masks of all channels when trying to find a suitable config. Without doing this, an XRGB8888 EGLConfig would match on an R8 format. Signed-off-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-02-09 16:17:16 +00:00

1 2 3 4 5 ...

100084 Commits All Branches Search

100084 Commits

All Branches