KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Eric Anholt	d04b07f8e2	vc4: Fix off-by-one in branch target validation.	2015-04-13 22:34:06 -07:00
Eric Anholt	7fa2f2e366	vc4: Use NIR-level lowering for idiv. This fixes the idiv tests in piglit.	2015-04-13 21:36:40 -07:00
Eric Anholt	84ebaff1b7	vc4: Add a bunch of type conversions. These are required to get piglit's idiv tests working. The unsigned<->float conversions are wrong, but are good enough to get piglit's small ranges of values working.	2015-04-13 21:36:40 -07:00
Eric Anholt	adae027260	vc4: Use the blit interface for updating shadow textures. This lets us plug in a better blit implementation and have it impact the shadow update, too.	2015-04-13 10:39:24 -07:00
Eric Anholt	39b6f7e76c	vc4: Remove dead fields from vc4_surface.	2015-04-13 10:39:24 -07:00
Eric Anholt	5100221ff7	vc4: Skip sending down the clear colors if not clearing.	2015-04-13 10:39:24 -07:00
Eric Anholt	725620f21d	vc4: Sync with kernel changes to relax BCL versus RCL validation. There was no reason to tie the two packets' values together.	2015-04-13 10:39:23 -07:00
Eric Anholt	cb88d2cfcb	vc4: Fix another space allocation mistake. We're over-allocating our BCL in vc4_draw.c, so this never mattered. However, new RCL-only blit support might end up here without having set up any BCL contents.	2015-04-13 10:39:02 -07:00
Eric Anholt	8eb9304ee7	vc4: Add missed accounting for the size of the semaphore. This wouldn't have mattered except in the worst case scenario RCL setup.	2015-04-13 10:33:30 -07:00
Rob Clark	b98c0262d1	freedreno/ir3/nir: couple little fixes Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:41:03 -04:00
Rob Clark	1b936bb9f8	freedreno/ir3/nir: handle system values Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:57 -04:00
Rob Clark	715b2e0dbb	freedreno/ir3/nir: handle txs and query_levels tex ops These correspond to the tgsi TXQ opcode (plus sneak in a fix for two-sided color) Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:43 -04:00
Rob Clark	97e8fc3fdd	freedreno/ir3/nir: split out tex helpers We'll need these in one or two other spots. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:36 -04:00
Rob Clark	6e8160d6e3	freedreno/ir3/nir: simplify emit_tex() Just build up arrays for src0/src1, and use create_collect().. Also add back missing .3d flag for 3d/cube textures. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:28 -04:00
Rob Clark	d5357c16cc	freedreno/ir3/cp: handle indirect properly I noticed some cases where we where trying to copy-propagate indirect src's into places they cannot go, like 2nd src for cat3 (mad, etc). Expand out valid_flags() to be aware of relativ flag, and fix up a few related spots. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:21 -04:00
Rob Clark	49be76166b	freedreno/ir3/sched: avoid getting stuck on addr conflicts When we get in a scenario where we cannot schedule any more instructions due to address register conflict, clone the instruction that writes the address register, and switch the remaining unscheduled users for the current address register over to the new clone. This is simpler and more robust than the previous attempt (which tried and sometimes failed to ensure all other dependencies of users of the address register were scheduled first).. hint it would try to schedule instructions that were not actually needed for any output value. We probably need to do the same with predicate register, although so far it isn't so heavily used so we aren't running into problems with it (yet). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:15 -04:00
Rob Clark	4cf4006674	freedreno/ir3/nir: add variable-indexing support A bit fugly.. try and make this cleaner.. note if we hoist all the get_addr() out of the loop we can drop the hashtable and just use create_addr().. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:09 -04:00
Rob Clark	972ce757d7	freedreno/ir3/asm: change assert to warning It probably should be an assert, but for now TGSI f/e isn't very good about dealing w/ CONST vs ABS/NEG. So for debug builds, print a warning instead of crashing with an assert for now. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:03 -04:00
Rob Clark	09cbd97a47	freedreno/ir3/nir: set first_driver_param Without this, a3xx breaks.. a4xx would too if it had already implemented support for passing driver params. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:39:56 -04:00
Rob Clark	f0e9a632a1	freedreno/ir3/cp: support to swap mad src's For a normal MAD (ie. not MADSH), if first source is gpr and second source is const, we can swap the first two sources to avoid needing a mov instruction. This gives back the biggest advantage TGSI f/e had over NIR f/e for common shaders, since TGSI f/e had this logic in the f/e. Note that doing this in copy-prop step has the advantage that it will also work for cases like: MOV TEMP[b], CONST[x] MAD TEMP[d], TEMP[a], TEMP[b], TEMP[c] Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:39:46 -04:00
Rob Clark	fd65122a90	gallium/ttn: add support for system values So far just the system values that freedreno supports, so we may add more later. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:43:16 -04:00
Rob Clark	2faa878f13	gallium/ttn: fix TXD With TXD we also have the ddx/ddy sources (before the sampler). Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:43:16 -04:00
Rob Clark	ca3ae90490	gallium/ttn: add TXQ support (v2) Split out from ttn_tex() since it is kind of a weird instruction that maps to two NIR opcodes, and it was cleaner this way. v2: query_levels doesn't take any args Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:43:15 -04:00
Rob Clark	0b71451920	gallium/ttn: split out helper to get texture info We'll need this as well for TXQ. Split this out first to reduce noise in the next patch. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:43:15 -04:00
Rob Clark	96c0f9328d	gallium/ttn: add support for temp arrays Since the rest of NIR really would rather have these as variables rather than registers, create a nir_variable per array. But rather than completely re-arrange ttn to be variable based rather than register based, keep the registers. In the cases where there is a matching var for the reg, ttn_emit_instruction will append the appropriate intrinsic to get things back from the shadow reg into the variable. NOTE: this doesn't quite handle TEMP[ADDR[]] when the DCL doesn't give an array id. But those just kinda suck, and should really go away. AFAICT we don't get those from glsl. Might be an issue for some other state tracker. v2: rework to use load_var/store_var with deref chains v3: create new "burner" reg for temporarily holding the (potentially writemask'd) dest after each instruction; add load_var to initialize temporary dest in case not all components are overwritten v4: review comments: asserts and use ttn_src_for_indirect() in ttn_array_deref() so we can drop later patch converting to use vec1 for addr reg (since ttn_src_for_indirect() handles the imov to vec1 from tgsi addr component that we want) v5: rebase: new requirements about parent mem ctx for derefs Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:41:45 -04:00
Rob Clark	b91d987140	gallium/ttn: minor cleanup Extract tgsi_dst->Index into a local.. split out from 'gallium/ttn: add support for temp arrays' for noise reduction.. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 10:24:50 -04:00
Nick Sarnie	f9048ee3c8	gallivm: Fix build since llvm-3.7.0svn r234495 Revert `50e9fa2ed6` as LLVM reverted their change. Signed-off-by: Nick Sarnie <commendsarnex@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>	2015-04-10 13:30:23 -04:00
Vinson Lee	50e9fa2ed6	gallivm: Fix build since llvm-3.7.0svn r234460. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89963 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-04-09 10:41:26 -07:00
Roland Scheidegger	a873b79fa5	draw: (trivial) don't print the shader twice with GALLIVM_DEBUG=tgsi (or ir) Neither the shader nor the key change when doing elts or linear variant, so this was just annoying (probably mildly useful at some point when we printed the IR per function too). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-04-09 01:32:30 +02:00
Roland Scheidegger	586536a4e1	gallivm: don't use control flow when doing indirect constant buffer lookups llvm goes crazy when doing that, using way more memory and time, though there's probably more to it - this points to a very much similar issue as fixed in `8a9f5ecdb1`. In any case I've seen a quite plain looking vertex shader with just ~50 simple tgsi instructions (but with a dozen or so such indirect constant buffer lookups) go from a terribly high ~440ms compile time (consuming 25MB of memory in the process) down to a still awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious improvements possible (but I have no clue why it's so slow...). The resulting shader is most likely also faster (certainly seemed so though I don't have any hard numbers as it may have been influenced by compile times) since generally fetching constants outside the buffer range is most likely an app error (that is we expect all indices to be valid). It is possible this fixes some mysterious vertex shader slowdowns we've seen ever since we are conforming to newer apis at least partially (the main draw loop also has similar looking conditionals which we probably could do without - if not for the fetch at least for the additional elts condition.) v2: use static vars for the fake bufs, minor code cleanups Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-04-09 01:32:30 +02:00
Dave Airlie	6b722c390b	u_tile: fix warnings about incompatible casts. Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 10:31:42 +10:00
Glenn Kennard	f2947807c8	r600g/sb: Enable SB for geometry shaders Add SV_GEOMETRY_EMIT special variable type to track the implicit dependencies between CUT/EMIT_VERTEX/MEM_RING instructions so GCM/scheduler doesn't reorder them. Mark emit instructions as unkillable so DCE doesn't eat them. Enable only for evergreen/cayman as there are a few unexplained GS piglit regressions on R6xx/R7xx with SB enabled otherwise. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 08:18:35 +10:00
Glenn Kennard	06bb68da4a	r600g/sb: Update last_cf for loops CF_END could end up emitted in the middle of a shader on cayman when there was a loop at the very end. Fixes glsl-1.50-geometry-end-primitive and ext_transform_feedback-geometry-shaders-basic piglit tests. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 08:18:17 +10:00
Dave Airlie	61393bdcdc	u_tile: fix stencil texturing tests under softpipe arb_stencil_texturing-draw failed under softpipe because we got a float back from the texturing function, and then tried to U2F it, stencil texturing returns ints, so we should fix the tiling to retrieve the stencil values as integers not floats. Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 08:17:32 +10:00
Ilia Mirkin	ae720c66cb	nv50,nvc0: limit the y-tiling of 3d textures to the first level's tiling We limit y-tiling to 0x20 when depth is involved. However the function is run for each miplevel, and the hardware expects miplevel 0 to have the highest tiling settings. Perform the y-tiling limit on all levels of a 3d texture, not just the ones that have depth. Fixes: texelFetch fs sampler3D 98x129x1-98x129x9 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Nick Tenney <nick.tenney@gmail.com> # GT216 Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-04-06 23:06:55 -04:00
Dave Airlie	ad84689f73	r600g: fix op3 abs issue This code to handle absolute values on op3 srcs was a bit too simple, it really needs a temp reg per src, not one per channel, make it easier and let sb clean up the mess. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89831 Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-07 11:40:16 +10:00
Rob Clark	8b0b81339b	freedreno/ir3: add NIR compiler The NIR compiler frontend is an alternative to the TGSI f/e, producing the same ir3 IR and using the same backend passes for scheduling, etc. It is not enabled by default yet, as there are still some regressions. To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for example, xonotic or supertuxkart. With the NIR f/e, scalarizing and a number of other lowering steps happen in NIR, so we don't have to do them in ir3. Which simplifies the f/e and allows the lowered instructions to pass through other optimization stages. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:40 -04:00
Ilia Mirkin	700d949ea1	freedreno/a3xx: don't decode srgb on mem2gmem Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	b060b56772	freedreno/a3xx: pass sprite coord mode through to program emit Use the correct sprite replacement depending on the flip of the coord mode, using either T or 1-T depending on whether we have an upper-left or lower-left coordinate origin. This fixes all the point sprite piglits. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	1de72dfc8a	freedreno/a3xx: add UBO support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	c7811f56c2	freedreno/ir3: insert nop between sfu/mem operations Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	14dfd8cc43	freedreno: dirty context when reallocating a bound bo Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	bde2045fa2	freedreno: keep track of buffer valid ranges Copies nouveau_buffer and radeon_buffer. This allows a write to proceed to an uninitialized part of a buffer even when the GPU is using the previously-initialized portions. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	dacf22e0a3	freedreno: mark resources as being read so that writes flush the queue Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	2e1445c8f3	freedreno: don't bother setting resource timestamps Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to keep separate timestamps around. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	1fee3061d5	freedreno: add a reading flag to indicate gpu is reading rsc Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	ea0952a9db	freedreno: fix resource flushing confusion A resource flush is an upload of a hypothetically-staging texture to the GPU. For a UMA system, this will largely be a no-op or cache-maintenance. Move the render flush logic into transfer_map where it belongs, and clear out the transfer_flush function. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	bfb0a8eb69	freedreno: remove tex_resource pipe_sampler_view already contains a texture, remove the redundant tex_resource member which pointed at the same thing. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Rob Clark	6cd9c94ce4	freedreno/ir3: handle FRAG IN's without interpolation specified Fallback to picking based on semantic name. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	f513f006ce	freedreno/ir3/cmdline: add @const headers for immediates Since NIR f/e currently encodes immediates in instructions (rather than passing via const), we need to ensure that when const's are used the get initialized to the proper values. Otherwise comparing NIR to TGSI compiler, it will use proper immediate values in one case, and randomly initialize values in the other. Which confuses ir3test. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	6bc12bb5fd	freedreno/ir3/cmdline: remove hack for old compiler Since we dropped the old compiler, we don't need this hack anymore. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	f370e95421	freedreno/ir3: handle const/immed/abs/neg in cp Be smarter about propagating copies from const or immed, or with abs/neg modifiers. Also, realize that absneg.s and absneg.f are really "fancy" mov instructions. This opens up the possibility to remove more copies. It helps the TGSI frontend a bit, but will be really needed for the NIR f/e which builds everything up in SSA form (ie. will always insert a mov from const or immediate). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	104713d9f2	freedreno/ir3: split float/int abs/neg Even though in the end, they map to the same bits, the backend will need to be able to differentiate float abs/neg vs integer abs/neg. Rather than making the backend figure it out based on instruction opcode (which when combined with mov/absneg instructions, can be awkward), just split out different flags for each so the frontend can signal it's intentions more clearly. Also, since (neg) for bitwise op's is actually a bitwise- not, split it out into bnot flag. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	203f37540a	freedreno/ir3: add ir3 builder helpers Add helpers for constructing SSA forms of instructions. Only partial cat5/cat6 coverage.. but we can add stuff as needed. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	b1c9fb9fca	freedreno/ir3: fix sam argument order comment Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	101142c401	xa: support for drivers which use NIR We need to pull in libnir.la and it's dependency libglsl_util.la. Also, _mesa_error_no_memory() must be defined. Fortunately with libnir.la (vs pulling in all of libglsl.la) we don't also need libstdc++. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 09:24:17 -04:00
Rob Clark	52282fa42d	gallium/ttn: MOD is an integer instruction Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net	2015-04-05 09:24:17 -04:00
Rob Clark	7579ae422a	gallium/ttn: add UMAD Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-05 09:24:17 -04:00
Ilia Mirkin	ba353935a3	nv50: allocate more offset space for occlusion queries Commit `1a170980a0` started writing to q->data[4]/[5] but kept the per-query space at 16, which meant that in some cases we would write past the end of the buffer. Rotate by 32, like nvc0 does. This ensures that we always have 32 bytes in front of us, and the data writes will go within the allocated space. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89679 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Nick Tenney <nick.tenney@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-04-04 11:30:03 -04:00
Ilia Mirkin	01d3b750b3	nv50/ir: avoid folding immediates into imad operations Commit `09ee907266` added logic to fold immediates into mad operations, but the emission code is only there for fmad. Only allow it on float types. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 18:42:31 -04:00
Ilia Mirkin	603d28f32c	nv50/ir: fix imad emission when dst == src2 Commit `fb63df2215` added 4-byte mad support, but only supported emission for floats. Disable it for ints for now. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 18:35:59 -04:00
Eric Anholt	a9152376b4	vc4: Add support for nir_iabs. Tested using the GLSL 1.30 tests for integer abs(). Not currently used, but it was one of the new opcodes used by robclark's idiv lowering.	2015-04-02 10:32:35 -07:00
Ilia Mirkin	4a3c0e9950	freedreno/a3xx: add MRT support The hardware only supports 4 MRTs. It should be possible to emulate support for 8, but doesn't seem worth the trouble. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	6f4c1976f4	freedreno: convert blit program to array for each number of rts Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	d9992ab35a	freedreno: add support for laying out MRTs in gmem Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	602bc6c88d	freedreno: add core infrastructure support for MRTs Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	d13803c76f	freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS property This will enable the driver to tell which regids to link up to which MRT outputs. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	f27ec59084	freedreno/a3xx: add independent blend function support This is needed for MRT support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	8efa3e340d	freedreno: remove alpha key from ir3_shader This complication is unnecessary and makes MRTs more complicated and likely to generate tons of variants. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Stéphane Marchesin	70eed78cac	i915g: Implement EGL_EXT_image_dma_buf_import This adds all the plumbing to get EGL_EXT_image_dma_buf_import in i915g. Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>	2015-04-01 20:13:37 -07:00
Emil Velikov	d99135b2e9	configure: nuke --with-max-{width,height} Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH, MAX_HEIGHT). Update all the remaining references to the defines. v2: Use the correct variable name in the comments Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2015-04-01 19:43:34 +00:00
Emil Velikov	bd4925c6ac	gallium: ship tgsi_to_nir.h in the tarball Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-04-01 19:33:37 +00:00
Jose Fonseca	3321724c10	automake,scons: Put NIR source files in a separate var to fix SCons build. SCons does not build NIR yet. Trivial.	2015-04-01 19:49:09 +01:00
Jose Fonseca	7f0682cebf	automake: Fix out-of-source builds. Add include path for generated nir_opcodes.h. Trivial.	2015-04-01 19:48:09 +01:00
Eric Anholt	26261bca21	vc4: Add shader-db dumping of NIR instruction count. I was previously using temporary disables of VC4 optimization to show the benefits of improved NIR optimization, but this can get me quick and dirty numbers for NIR-only improvements without having to add hacks to disable VC4's code (disabling of which might hide ways that the NIR changes would hurt actual VC4 codegen).	2015-04-01 10:57:01 -07:00
Eric Anholt	73e2d4837d	vc4: Convert to consuming NIR. NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)	2015-04-01 10:57:01 -07:00
Eric Anholt	783ad697d2	gallium: Add tgsi_to_nir to get a nir_shader for a TGSI shader. This will be used by the VC4 driver for doing device-independent optimization, and hopefully eventually replacing its whole IR. It also may be useful to other drivers for the same reason. v2: Add all of the instructions I was relying on tgsi_lowering to remove, and more. v3: Rebase on SSA rework of the builder. v4: Use the NIR ineg operation instead of doing a src modifier. v5: Don't use ineg for fnegs. (infer_src_type on MOV doesn't do what I expect, again). v6: Fix handling of multi-channel KILL_IF sources. v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than a vector load_const. CSE doesn't recognize that srcs out of those channels are actually all the same. v8: Rebase on nir_builder auto-sizing, make the scalar arguments to non-ALU instructions actually be scalars. v9: Add support for if/loop instructions, additional texture targets, and untested support for indirect addressing on temps. v10: Rebase on master, drop bad comment about control flow and just choose the X channel, use int comparison opcodes in LIT for now, drop unused pipe_context argument.. v11: Fix translation of LRP (previously missed because I mis-translated back out), use nir_builder init helpers. v12: Rebase on master, adding explicit include of mtypes.h to get INTERP_QUALIFIER_* v13: Rebase on variables being in lists instead of hash tables, drop use of mtypes.h in favor of util/pipeline.h. Use Ken's nir_builder swizzle and fmov/imov_alu helpers, drop "struct" in front of nir_builder, use nir_builder directly as the function arg in a lot of cases, drop redundant members of ttn_compile that are also in nir_builder, drop some half-baked malloc failure handling. v14: The indirect uniform src0 should be scalar, not vector (noticed as odd by robclark, confirmed by cwabbott). Apply Ken's review to initialize s->num_uniforms and friends, skip ttn_channel for dot products, and use the simpler discard_if intrinsic. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v13) Acked-by: Rob Clark <robclark@freedesktop.org>	2015-04-01 10:57:01 -07:00
Eric Anholt	486dcfbbd9	vc4: Tell shader-db how big our UBOs are, if present. I had regressed them for a while with the NIR work.	2015-04-01 10:57:01 -07:00
Roland Scheidegger	e3252defd2	gallivm: (trivial) fix the logic deciding if function call should be used... Copy and paste bug with the img filter decision. Since there's only 2 different filters anyway just drop this bit.	2015-04-01 13:26:19 +02:00
Dave Airlie	8f7338f284	egl: add initial EGL_MESA_image_dma_buf_export v2.4 At the moment to get an EGL image to a dma-buf file descriptor, you have to use EGL_MESA_drm_image, and then use libdrm to convert this to a file descriptor. This extension just provides an API modelled on EGL_MESA_drm_image, to return a dma-buf file descriptor. v2: update spec for new API proposal add internal queries to get the fourcc back from intel driver. v2.1: add gallium pieces. v2.2: add offsets to spec and API, rename fd->fds, stride->strides in API. rewrite spec a bit more, add some q/a v2.3: add modifiers to query interface and 64-bit type for that (Daniel Stone) specifiy what happens to num fds vs num planes differences. (Chad Versace) v2.4: fix grammar (Daniel Stone) Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-01 14:10:04 +10:00
Roland Scheidegger	611bd80f3b	gallivm: do some hack heuristic to disable texture functions We've seen some cases where performance can hurt quite a bit. Technically, the more simple the function the more overhead there is for using a function for this (and the less benefits this provides). Hence don't do this if we expect the generated code to be simple. There's an even more important reason why this hurts performance, which is shaders reusing the same unit with some of the same inputs, as llvm cannot figure out the calculations are the same if they are performned in the function (even just reusing the same unit without any input being the same provides such optimization opportunities though not very much). This is something which would need to be handled by IPO passes however.	2015-04-01 00:56:12 +02:00
Marcin Ślusarz	f9e2295560	nouveau: synchronize "scratch runout" destruction with the command stream When nvc0_push_vbo calls nouveau_scratch_done it does not mean scratch buffers can be freed immediately. It means "when hardware advances to this place in the command stream the scratch buffers can be freed". To fix it, just postpone scratch runout destruction after current fence is signalled. The bug existed for a very long time. Nobody noticed, because "scratch runout" code path is rarely executed. Fixes hang at the very beginning of first mission in "Serious Sam 3" on nve7/gk107. It manifested as: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]] Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-31 22:04:31 +02:00
Tom Stellard	fda7558057	clover: Return CL_BUILD_ERROR for CL_PROGRAM_BUILD_STATUS when compilation fails v2 v2: - Don't use _errs map Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-03-31 15:40:51 +00:00
Tom Stellard	4c53d2acbb	radeonsi/compute: Default to the same PIPE_SHADER_CAP values as other shader types v2 v2: - Fix typo Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-31 15:40:51 +00:00
Leo Liu	a714fbacf7	radeon/vce: implement video usability information support This will help encoding VUI into the bitstream v2: make backward compatible Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2015-03-31 12:31:58 -04:00
Leo Liu	8e3668a7c0	st/omx/enc: export framerate to vce driver The framerate will be used for video usability info support by VCE driver Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2015-03-31 12:31:58 -04:00
Roland Scheidegger	489866938f	llvmpipe: enable ARB_texture_gather Just announce support for 4 components. While here also increase the max/min texel offsets (the limit is completely artificial, was chosen because that's what other hardware did, however there's other drivers using larger limits). Over a thousand little piglits skip->pass. v2: update docs/GL3.txt Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Roland Scheidegger	0753b135f6	gallivm: implement TG4 for ARB_texture_gather This is quite trivial, essentially just follow all the same code you'd use with linear min/mag (and no mip) filter, then just skip the filtering after looking up the texels in favor of direct assignment of the right channel to the result. (This is though not true for the multi-offset version if we'd want to support it - for this would probably need to do something along the lines of 4x nearest sampling due to the necessity of doing coord wrapping individually per texel.) Supports multi-channel formats. From the SM5 gather cap bit, should support non-constant offsets, plus shadow comparisons (the former untested), but not component selection (should be easy to implement but all this stuff is not really exposable anyway for now). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Roland Scheidegger	73c6914195	gallivm: add gather support to sampler interface Luckily thanks to the revamped interface this is a lot less work now... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Roland Scheidegger	1863ed21ff	gallivm: simplify sampler interface This has got a bit out of control with more and more parameters added. Worse, whenever something in there changes all callees have to be updated for that, even though they don't really do much with any parameter in there except pass it on to the actual sampling function. Hence simply put almost everything into a struct. Also instead of relying on some arguments being NULL, be explicit and set this in a key (which is just reused for function generation for simplicity). (The code still relies on them being NULL in the end for now.) Technically there is a minimal functional change here for shadow sampling: if shadow sampling is done is now determined explicitly by the texture function (either sample_c or the gl-style tex func inherit this from target) instead of the static texture state. These two should always match, however. Otherwise, it should generate all the same code. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Jose Fonseca	0fc5b80e7a	util/debug: Update MgwHelp link, drop BfdHelp link.	2015-03-31 09:42:06 +01:00
Michel Dänzer	b8797a7875	gallivm: Fix build against LLVM 3.7 SVN r233648 Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 15:05:01 +09:00
Eric Anholt	1dcc1ee314	vc4: Drop integer multiplies with 0 to moves of 0. This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%)	2015-03-30 12:57:45 -07:00
Eric Anholt	8c5dcdbccb	vc4: Add a constant folding pass. This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)	2015-03-30 12:57:45 -07:00
Eric Anholt	c519c4d85e	vc4: Don't bother masking out the low 24 bits for integer multiplies The hardware just uses the low 24 lines, saving us an AND to drop the high bits. total uniforms in shared programs: 13433 -> 13423 (-0.07%) uniforms in affected programs: 356 -> 346 (-2.81%) total instructions in shared programs: 40003 -> 39989 (-0.03%) instructions in affected programs: 910 -> 896 (-1.54%)	2015-03-30 09:23:39 -07:00
Eric Anholt	5df8bf86fe	vc4: Make integer multiply use 24 bits for the low parts. The hardware uses the low 24 bits in integer multiplies, so we can have fewer high bits (and so probably drop them more frequently).	2015-03-30 09:23:39 -07:00
Michel Dänzer	d64adc3a79	radeonsi: Cache LLVMTargetMachineRef in context instead of in screen Fixes a crash in genymotion with several threads compiling shaders concurrently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746 Cc: 10.5 <mesa-stable@lists.freedesktop.org> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-30 15:15:10 +09:00
Ilia Mirkin	ee670c9efa	freedreno/a3xx: add support for point sprite coordinate replacement This does not (yet) support different coordinate origins, so the tests still fail due to fbo flipping. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	995f55a6ce	freedreno/a3xx: make vs-set point size work This appears to need the A2XX version of the point list, so select it at draw time if necessary. Experimentally, always using the A2XX version causes hangs when PSIZE isn't actually emitted. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	7fc5da8b93	freedreno/a3xx: point size should not be divided by 2 The division is probably a holdover from the days when the fixed point inline functions generated by headergen were broken. Also reduce the maximum point size to 4092 (vs 4096), which is what the blob does. Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	738c8319ac	freedreno/a3xx: fix 3d texture layout The SZ2 field contains the layer size of a lower miplevel. It only contains 4 bits, which limits the maximum layer size it can describe. In situations where the next miplevel would be too big, the hardware appears to keep minifying the size until it hits one of that size. Unfortunately the hardware's ideas about sizes can differ from freedreno's which can still lead to issues. Minimize those by stopping to minify as soon as possible. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	3735643df3	freedreno/a3xx: LAYERSZ2 appears to have no effect on arrays Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:40 -04:00
Roland Scheidegger	b2424fb030	llvmpipe: simplify address calculation for 4x4 blocks These functions looked quite complicated, even though what they actually did was trivial (ever since we dropped swizzled rendering). Also drop lookup of format block per bytes done for each block, and do it once per scene instead. This improves everybody's favorite "benchmark" by 3% or so, though lp_rast_shade_quads_all() which calls this shows up still quite high for a function which does little more than call the jit function. (This would most likely be much better handled by the jit function itself, the strides are passed through anyway already, though for being able to handle layers it would definitely add some complexity.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-28 02:59:42 +01:00
Roland Scheidegger	764fc2be5a	gallivm: fix texture function name (key) when using txf/ld When using the texel fetch functions rather than ordinary texturing, the arguments are all int vecs instead of float vecs, not to mention the actual function would look completely different. Hence this must be included in the texture function name (which serves as the key) otherwise things crash badly when a shader accesses the same texture and sampler unit with both txf/ld and ordinary texturing instructions with otherwise matching keys.	2015-03-28 02:58:43 +01:00
Ilia Mirkin	58030a8f99	nv50/ir/gk110: fix offset flag position for TXD opcode Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-27 19:02:19 -04:00
Ilia Mirkin	49b86007aa	nv50/ir: take postFactor into account when doing peephole optimizations Multiply operations can have a post-factor on them, which other ops don't support. Only perform the peephole optimizations when there is no post-factor involved. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758 Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-27 19:02:19 -04:00
Jan Vesely	a99a16a0cf	gallivm: Fix build since llvm r233411 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-27 18:57:02 -04:00
Roland Scheidegger	56076be2ac	gallivm: use llvm function calls for texturing instead of inlining There are issues with inlining everything, most notably llvm will use much more memory (and be slower) when compiling. Ideally we'd probably use functions for shader functions too but texture sampling usually is responsible for quite some IR (it can easily reach 80% of total IR instructions) so this seems like a good start. This still generates a different function for all different combinations just like before, however it is possible llvm is missing some optimization opportunities - it is believed though such opportunities should be somewhat rare, but at least for now it can still be switched off (at compile time only). It should probably make compiled code also smaller because the same function should be used for different variants in the same module (so for the opaque/partial or linear/elts variants). No piglit change (though it does indeed speed up unrealistic tests like fp-indirections2 by a factor of 30 or so). Has a small negative performance impact in openarena - I suspect this could be fixed by running some IPO passes (despite the private linkage, llvm right now does NO optimization at all wrt anything going past the call, even if there's just one caller - so things like values stored before the call and then always written by the function etc. will not be optimized away, nor will dead arguments (which we mostly shouldn't have) be eliminated, always constant arguments promoted etc.). v2: use proper return values instead of pointer function arguments. llvm supports aggregate return values, which do wonders here eliminating unnecessary stack variables - everything in fact will be returned in registers even without any IPO optimizations. It makes the code simpler too. With this I could not measure a peformance impact in openarena any longer (though since there's still no constant value propagation etc. into the tex functions this does not mean it couldn't have a negative impact elsewhere). v3: fix some minor issues suggested by Jose, and do disassembly (and the profiling) without hacks. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-27 19:25:53 +01:00
Roland Scheidegger	8dad9455ff	gallivm: pass jit_context pointer through to sampling The callbacks used for getting the dynamic texture/sampler state were using the jit_context from the generated jit function. This works just fine, however that way it's impossible to generate separate functions for texture sampling, as will be done in the next commit. Hence, pass this pointer through all interfaces so it can be passed to a separate function (technically, it would probably be possible to extract this pointer from the current function instead, but this feels hacky and would probably require some more hacks if we'd use real functions instead of inlining all shader functions at some point). There should be no difference in the generated code for now. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-27 19:25:53 +01:00
Christian König	787aa26cb7	gallium/vl: partially revert "Use util_cpu_to_le{16,32} in many more places." The data in memory is in big endian format and needs to be converted into CPU byte order. So the patch actually reversed what needs to be done. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-03-27 11:30:32 +01:00
Ilia Mirkin	626434893a	tgsi: fix out-of-bounds access for cube arrays The CUBE_ARRAY case uses r[4]. Make sure that the stack variable is there. Noticed by Coverity. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-03-26 21:02:09 -04:00
Ilia Mirkin	1b87d73a9f	gallium/util: remove u_linkage Does not appear to be used in tree. Coverity spotted some errors in the bitmask stuff, but the whole thing appears to be unused. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-26 21:02:09 -04:00
Ilia Mirkin	2e34faaf06	gallium/hud: avoid overflowing hud graph name size Spotted by Coverity. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-26 21:02:08 -04:00
Jonathan Gray	726d99b197	gallium/util: Use HAVE___BUILTIN_FFS* macros. Make use of the builtin ffs macros and split out ffsll to a seperate block. Needed for at least OpenBSD which does not have ffsll in libc. Signed-off-by: Jonathan Gray <jsg@jsg.id.au> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-03-24 20:49:32 +00:00
Eric Anholt	7bc39c8418	vc4: Add a dump-the-surface-contents routine. This has been useful once again while trying to debug stride issues between render targets and texturing.	2015-03-24 10:39:12 -07:00
Eric Anholt	4df13f55b6	vc4: Allow DRI3 on simulation, as well. The problem I'd seen before seems to be gone.	2015-03-24 10:39:12 -07:00
Eric Anholt	7f797e3d17	vc4: Fix pitch alignment of linear textures. Fixes some non-power-of-two texture rendering when I force ARGB8888 to raster.	2015-03-24 10:39:12 -07:00
Eric Anholt	b3ea377f86	vc4: Write the alignment of level width consistently in validation. 16 / cpp happens to be the same as utile_w on the only raster format supported (4 bytes per pixel), but simulator/hw source code generally talks in terms of utiles.	2015-03-24 10:39:12 -07:00
Eric Anholt	8975a09494	vc4: Fix use of a bool as an enum. The enum compared to was 0, so it worked out, but it sure looked wrong.	2015-03-24 10:39:12 -07:00
Eric Anholt	04605c21f6	vc4: Decide the HW's format before laying out the miptree. I'm experimenting with a workaround for raster texture misrendering on hardware, and this lets me look at the format chosen when computing strides.	2015-03-24 10:39:12 -07:00
Eric Anholt	25d60763d9	vc4: Use our device-specific ioctls for create/mmap. They don't do anything special for us, but I've been told by kernel maintainers that relying on dumb for my acceleration-capable buffers is not OK.	2015-03-24 10:39:12 -07:00
Eric Anholt	af3d747194	vc4: Make a new #define for making code conditional on the simulator. I'd like to compile as much of the device-specific code as possible when building for simulator, and using if (using_simulator) instead of ifdefs helps.	2015-03-24 10:39:12 -07:00
Eric Anholt	9bafcf630a	vc4: Add some useful debug printfs for miptrees. I keep rewriting these.	2015-03-24 10:39:12 -07:00
Ilia Mirkin	43277fcd59	Revert "nv50,nvc0: remove bogus 64_FLOAT formats" This reverts commit `20346808cf`. The conversion is actually done since these are the *B macro variants and no vtx format is supplied, which makes them go through the translate module. This restores the following piglit tests to passing: draw-vertices user gl-2.0-vertexattribpointer Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-23 20:57:52 -04:00
Tom Stellard	dfb1ae9d91	clover: Return 0 as storage size for local kernel args that are not set v2 The storage size for local kernel args can be queried before the arguments are set by using the CL_KERNEL_LOCAL_MEM_SIZE param of clGetKernelWorkGroupInfo(). The spec says that if local kernel arguments have not been specified, then we should assume their size is 0. v2: - Implement using c++11 member initialization. Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-23 17:20:21 +00:00
Tom Stellard	769b366b83	gallivm: Use MCInstrInfo in the disassembler for querying instruction info This fixes the build since llvm r232885 and also simplifies the code.	2015-03-23 14:43:10 +00:00
Giuseppe Bilotta	7932b30892	clover: use get_device_vendor instead of get_vendor The pipe's get_vendor method returns something more akin to a driver vendor string in most cases, instead of the actual device vendor. Use get_device_vendor instead, which was introduced specifically for this purpose. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-03-23 13:25:34 +00:00
Giuseppe Bilotta	76039b38f0	gallium: implement get_device_vendor() for existing drivers The only hackish ones are llvmpipe and softpipe, which currently return the same string as for get_vendor(), while ideally they should return the CPU vendor. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-23 13:25:34 +00:00
Giuseppe Bilotta	31d4e6fbff	gallium: introduce get_device_vendor() entrypoint for pipes This will be needed by Clover to return the correct information to CL_DEVICE_VENDOR info queries. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-23 13:25:34 +00:00
Giuseppe Bilotta	9280f17e82	gallium: remove trailing whitespace in p_screen.h Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-23 13:25:34 +00:00
Tom Stellard	6e17936bf8	clover: The unit for CL_DEVICE_MEM_BASE_ADDR_ALIGN is bits not bytes Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-03-23 13:22:42 +00:00
Tom Stellard	2b12b1752a	clover: Add all the mandatory 1.1 extensions to the extension string Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-03-23 13:22:42 +00:00
Tom Stellard	96f9cc9181	clover: Add a space at the end of CL_DEVICE_OPENCL_C_VERSION This is required by the spec. Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-03-23 13:22:42 +00:00
Jose Fonseca	397b491173	gallivm: Silence unused variable warnings on release builds. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-22 08:23:24 +00:00
Emil Velikov	7c7954b09d	galahad: actually remove the driver Should have been part of 429a4355259(galahad: remove driver). Seems like I've erroneously committed the trimmed patch. Reported-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-03-21 22:35:27 +00:00
Emil Velikov	429a435525	galahad: remove driver Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com>	2015-03-21 17:18:28 +00:00
Emil Velikov	84041bab3f	gallium/docs: remove information about identity driver Removed from tree. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com>	2015-03-21 17:18:25 +00:00
Emil Velikov	65a8d4d6dd	winsys/sw/fbdev: remove unused software winsys st/egl was its only user. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com>	2015-03-21 17:16:38 +00:00
Emil Velikov	1081ed9dc3	winsys/sw/wayland: remove unused winsys st/egl was its only user. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com>	2015-03-21 17:16:35 +00:00
Emil Velikov	48c7461d5a	st/gbm: remove state-tracker st/egl was its only user. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com>	2015-03-21 17:16:27 +00:00
Roland Scheidegger	e8039208c4	llvmpipe: use global llvm context for PIPE_SUBSYSTEM_EMBEDDED There's 2 reasons why we'd want to use the global context: 1) There still seems to be one memory "leak" left when using multiple llvm contexts (it is not a true leak as the memory disappears into some still addressable pool but nevertheless the memory consumption grows). See http://cgit.freedesktop.org/~jrfonseca/llvm-jitstress/ 2) These contexts get kinda big - even when disposing modules etc. after compiling a shader the LLVMContext can easily be over 100kB. So when there's lots of llvm contexts arounds it adds up. The downside is that at least right now this is absolutely not thread safe, so this only works safely in environments where multiple pipe contexts are not used concurrently. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-21 01:52:03 +01:00
Dave Airlie	9d97cd2e3e	u_primconvert: add primitive restart support This add primitive restart support to the prim conversion. This involves changing the API for the translate functions as we need to pass the prim restart index and the original number of indices into the translate functions. primitive restart is support for quads, quad strips and polygons. This deal with the case where the actual number of output primitives is less than the initially calculated number, by filling the rest of the output primitives with the restart index, the other option is to reduce the output prim number, but that will make the generator code a bit messier. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-03-20 09:46:30 +10:00
Brian Paul	8f255f948b	gallivm: remove unused 'builder' variable Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-19 12:56:35 -06:00
Jose Fonseca	a56f1a8b32	gallivm: Use INFINITY directly. Already done below. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-18 21:51:40 +00:00
Rob Clark	aee26d292f	freedreno/ir3: fix infinite recursion in sched One more case we need to handle. One of the src instructions for the indirect could also end up being ourself. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-18 10:42:33 -04:00
Rob Clark	62cc003b7d	freedreno: fix spelling Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-18 10:42:33 -04:00
Marek Olšák	a984abdad3	radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coords radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.) Discovered by Coverity. Reported by Ilia Mirkin. Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-18 12:04:27 +01:00
Emil Velikov	3f94a5afcb	r600g: constify r600_shader_tgsi_instruction lists. Massive list of constant data. Annotate it as such. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-17 23:52:39 +00:00
Emil Velikov	63cf2b4448	r600g: kill off r600_shader_tgsi_instruction::{tgsi_opcode,is_op3} Both of which are no longer used. Use designated initializer to make things obvious as people add/remove TGSI_OPCODEs. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-17 23:52:35 +00:00
Emil Velikov	5e68c6b322	r600g: use the tgsi opcode from parse.FullToken.FullInstruction ... rather than the local one in inst_info->tgsi_opcode. This will allow us to simplify struct r600_shader_tgsi_instruction. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-17 23:52:32 +00:00
Roland Scheidegger	2372275d2f	gallivm: abort properly when running out of buffer space in lp_disassembly Before this actually ran into an infinite loop printing out "invalid"... Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-17 00:46:48 +01:00
Emil Velikov	c066669b8d	st/dri: remove unused include from the automake/scons build st/dri/common hasn't been around for a while. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-16 20:59:52 +00:00
Emil Velikov	55f0c0a29f	auxiliary/os: fix the android build - s/drm_munmap/os_munmap/ Squash this silly typo introduced with commit c63eb5dd5ec(auxiliary/os: get the mmap/munmap wrappers working with android) Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-16 20:59:36 +00:00
Emil Velikov	5664f57df3	gallium/sw/kms: trivial cleanups Remove the forward declaration and make use of the DEBUG_PRINT macro for debug builds. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-16 20:59:22 +00:00
Marek Olšák	b5f19db976	radeonsi: implement TGSI_OPCODE_BFI (v2) v2: Don't use the intrinsics, the shader backend can recognize these patterns and generates optimal code automatically. Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-16 14:58:19 +01:00
Marek Olšák	d3723c614f	radeonsi: add a helper for extracting bitfields from parameters (v2) This will be used a lot (especially by tessellation). v2: don't use the bfe intrinsic Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-16 14:58:19 +01:00
Marek Olšák	dc39413640	radeonsi: move scratch reloc state setup - move it to its own function - do it after all states are emitted - bump SI_MAX_DRAW_CS_DWORDS Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	567c8d7300	radeonsi: don't emit PA_SC_LINE_STIPPLE if not rendering lines Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	1f4bb38264	radeonsi: don't emit PA_SC_LINE_STIPPLE after every rasterizer state change Do it only when the line stipple state is changed. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	f5832f3f9d	radeonsi: move PA_SU_SC_MODE_CNTL to rasterizer state This requires enabling the optional GL provoking vertex behavior for quads. + some cosmetic changes, so that the register is set exactly the same as on r600. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	98a2398222	radeonsi: implement line and polygon smoothing Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	303d23e10d	radeonsi: add shader code for smoothing The fragment shader multiplies the alpha channel with gl_SampleMaskIn. If blending is enabled, it looks like MSAA. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	4f20a8f278	radeonsi: split sample locations into its own state atom Sample locations are not updated as often as framebuffers. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	f7796a966d	radeonsi: add basic code for overrasterization This will be used for line and polygon smoothing. This is GCN-only even though it's in shared code. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	1921fa4304	radeonsi: small cleanup in si_shader_selector_key Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	52ff1edc51	radeonsi: simplify accessing alpha pointer in si_llvm_emit_fs_epilogue Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	955ebf2890	radeonsi: add support for easy opcodes from ARB_gpu_shader5 I have to use the BFE instrinsics, because BFE is one of the most complex instructions that can't be matched easily. BFE has 3 conditional branches and one of them is quite big. In the isel DAG, lowered BFE has 27 nodes (including leafs).	2015-03-16 12:54:18 +01:00
Marek Olšák	755a2907a3	radeonsi: implement bit-finding opcodes from ARB_gpu_shader5 Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	ca90cde81e	radeonsi: implement gl_SampleMaskIn Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	f9fd0c4a55	radeonsi: add support for SQRT Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	d73c1c1304	radeonsi: add support for FMA Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	dfea35666e	gallium/radeon: don't use LLVMReadOnlyAttribute for ALU None of the instructions use a pointer argument. (+ small cosmetic changes) Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	9da9c8e3f4	tgsi: handle bitwise opcodes in tgsi_opcode_infer_type (v2) v2: set the same types as the destination type in tgsi_exec Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-16 12:54:18 +01:00
Marek Olšák	216543ea54	gallium: add FMA and DFMA opcodes (v3) Needed by ARB_gpu_shader5. v2: select DMAD for FMA with double precision v3: add and select DFMA Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-16 12:54:18 +01:00
Rob Clark	e92bc6b38e	freedreno: update generated headers Fix a3xx texture layer-size. Signed-off-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-15 18:00:19 -04:00
Rob Clark	d3fb949c03	freedreno/ir3: remove old compiler Now that piglit is no longer falling back to old compiler for any tests, we can remove it. Hurray \o/ Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-15 13:27:03 -04:00
Rob Clark	feb858b788	freedreno/ir3: avoid scheduler deadlock Deadlock can occur if we schedule an address register write, yet some instructions which depend on that address register value also depend on other unscheduled instructions that depend on a different address register value. To solve this, before scheduling an address register write, ensure that all the other dependencies of the instructions which consume this address register are already scheduled. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-15 13:26:56 -04:00
Rob Clark	7208e96bb8	freedreno/ir3: bit of cleanup Add an array_insert() macro to simplify inserting into dynamically sized arrays, add a comment, and remove unused prototype inherited from the original freedreno.git/fdre-a3xx test code, etc. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-15 13:26:44 -04:00
Ilia Mirkin	620e29b748	freedreno: fix slice pitch calculations For example if width were 65, the first slice would get 96 while the second would get 32. However the hardware appears to expect the second pitch to be 64, based on halving the 96 (and aligning up to 32). This fixes texelFetch piglit tests on a3xx below a certain size. Going higher they break again, but most likely due to unrelated reasons. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Reviewed-by: Rob Clark <robclark@freedesktop.org>	2015-03-13 16:05:16 -04:00
Ilia Mirkin	89b26d5a36	freedreno/a3xx: use the same layer size for all slices We only program in one layer size per texture, so that means that all levels must share one size. This makes the piglit test bin/texelFetch fs sampler2DArray have the same breakage as its non-array version instead of being completely off, and makes bin/ext_texture_array-gen-mipmap start passing. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Reviewed-by: Rob Clark <robclark@freedesktop.org>	2015-03-13 16:05:16 -04:00
Brian Paul	558dcd8770	util: convert slab macros to inline functions Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-13 08:03:43 -06:00
Alexandre Demers	a38e6c4fbd	gallivm: (trivial) Fix typo in comment introduced by 70dc8a Fix typo in comment introduced by 70dc8a Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-13 13:52:52 +00:00
Jose Fonseca	70dc8a9930	gallivm: Prevent double delete on LLVM 3.6 std::unique_ptr takes ownership of MM, and a double delete could ensure in case of an error, as pointed out by Chris Vine in https://bugs.freedesktop.org/show_bug.cgi?id=89387 Reviewed-by: Chris Vine <chris@cvine.freeserve.co.uk>	2015-03-12 10:01:09 +00:00
Brian Paul	5376bc74cc	st/glx: use strdup() instead of _mesa_strdup() Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-03-11 09:34:24 -06:00
Samuel Pitoiset	e5cd42ed9a	nvc0: fix wrong max value for driver queries The maximum value of a Gallium HUD's panel is automatically adjusted when the current value is greater than the max. If we set the pipe_query_driver_info::max_value to UINT64_MAX, the maximum value is never adjusted and this results in a flat line instead of a pretty curve which is correctly scaled. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-09 20:47:05 -04:00
Alexandre Demers	7a37d5c3a4	r600g: Use R600_MAX_VIEWPORTS instead of 16 Lets define R600_MAX_VIEWPORTS instead of using 16 here and there in the code when looping through viewports and scissors. It is easier to understand what this number represents. v2: Missed a case where R600_MAX_VIEWPORTS should have been used. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2015-03-09 23:02:05 +01:00
Marek Olšák	c939231e72	r300g: fix sRGB->sRGB blits Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-09 21:22:22 +01:00
Marek Olšák	9953586af2	r300g: fix a crash when resolving into an sRGB texture Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-09 21:03:49 +01:00
Marek Olšák	113601086d	r300g: use memset for clearing the shader key	2015-03-09 20:58:32 +01:00
Marek Olšák	4815c187b7	r300g: remove the broken SNORM->UNORM shader lowering pass Not used anymore.	2015-03-09 20:58:32 +01:00
Marek Olšák	74a757f92f	r300g: fix RGTC1 and LATC1 SNORM formats Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-09 20:58:32 +01:00
Stefan Dösinger	f710b99071	r300g: Fix the ATI1N swizzle (RGTC1 and LATC1) This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01 test as well as the precision part of Wine's 3dc format test (fd.o bug 89156). The Z component seems to contain a lower precision version of the result, probably a temporary value from the decompression computation. The Y and W component contain different data that depends on the input values as well, but I could not make sense of them (Not that I tried very hard). GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in piglit, and both formats are affected by a compiler bug if they're sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx, which returns random garbage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156 Signed-off-by: Marek Olšák <marek.olsak@amd.com> Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-09 20:58:32 +01:00
Tom Stellard	51b43c559f	radeonsi: Add additional information to shader dumps This adds SGPR count, VGPR count, shader size, LDS size, and scratch usage to shader dumps. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-09 13:53:33 +00:00
Tom Stellard	bbfa1c3239	radeonsi/compute: Use value from compiler for COMPUTE_PGM_RSRC1.FLOAT_MODE Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-09 13:53:33 +00:00
Tom Stellard	a646b00cfc	clover: Return the minimum required value for CL_DEVICE_SINGLE_FP_CONFIG v2 This means dropping CL_FP_DENORM from the current return value. v2: - Add comments about minimum values for OpenCL 1.2. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>	2015-03-09 13:53:33 +00:00
Ilia Mirkin	cb3eb43ad6	freedreno/ir3: get the # of miplevels from getinfo This fixes ARB_texture_query_levels to actually return the desired value. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-09 10:50:39 -04:00
Ilia Mirkin	8ac957a51c	freedreno/ir3: fix array count returned by TXQ Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-09 10:50:39 -04:00
Ilia Mirkin	f3dfe6513c	freedreno: move fb state copy after checking for size change Fixes: `1f3ca56b` ("freedreno: use util_copy_framebuffer_state()") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-09 10:50:39 -04:00
Rob Clark	fd17db6fe5	freedreno: replace glsl130 debug flag with glsl120 Now that relative-dst works, we should never fall back to the old compiler. (Which is almost true, other than a couple edge case sched fails in piglit). So replace glsl130 flag to force GLSL 130 and integers on a3xx/a4xx with a glsl120 flag to force GLSL 120 and !integers. If this commit breaks any game/app/etc use FD_MESA_DEBUG=glsl120 as a workaround and please let me know. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	0e8d58b80a	gallium/docs: add some freedreno compiler docs Enable the 'sphinx.ext.graphviz' extension, and add in a section for driver specific docs, with freedreno compiler docs beneath. The goal is for more complete compiler docs, and hopefully some docs about other parts of the driver (such as how tiling works, etc). Note that there is also a Distribution -> Drivers section. Although that appears to be simply just a list of drivers. Not sure if that should move under the 'Drivers' section or left alone. I did add a one-line section for freedreno in the existing Distribution -> Drivers section. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00

... 2 3 4 5 6 ...

23528 Commits