KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Ilia Mirkin	746e5260f6	gallium: add a cap for max vertex streams Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-07-01 11:34:35 -04:00
Ilia Mirkin	43e4b3e311	gallium: add an index argument to create_query Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-07-01 11:34:31 -04:00
Takashi Iwai	6b8b17153a	llvmpipe: Fix zero-division in llvmpipe_texture_layout() Fix the crash of "gnome-control-center info" invocation on QEMU where zero height is passed at init. (sroland: simplify logic by eliminating the div altogether, using 64bit mul.) Fixes: https://bugzilla.novell.com/show_bug.cgi?id=879462 Cc: "10.2" <mesa-stable@lists.freedesktop.org>	2014-06-25 02:15:49 +02:00
Roland Scheidegger	2ea8e2fccf	llvmpipe: increase number of queries which can be binned simultaneously to 64 Gallium (but not OpenGL) does allow nesting of queries, but there's no limit specified (d3d10 has no limit neither). Nevertheless, for practical purposes we need some limit in llvmpipe, otherwise we'd need more complex handling of queries as we need to keep track of all binned queries (this only affects queries which gather data past setup). A limit of 16 is too small though, while 64 would suffice. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-06-13 20:08:39 +02:00
Christoph Bumiller	4b586a26c8	gallium: create TGSI_PROPERTY to disable viewport and clipping Marek v2: add a cap Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2014-06-02 12:49:03 +02:00
Roland Scheidegger	3fc72f2ec6	llvmpipe: (trivial) drop "unswizzled" from some function names This made sense when swizzled storage layout was used for rendering to tiles. But nowadays the name just adds confusion (and makes for long lines). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-31 22:05:14 +02:00
Roland Scheidegger	576868140b	llvmpipe: fix crash when not all attachments are populated in a fb Framebuffers can have NULL attachments since a while. llvmpipe handled that properly for lp_rast_shade_quads_mask but it seems the change didn't make it to lp_rast_shade_tile. This fixes piglit fbo-drawbuffers-none test (though I need to increase the FB_SIZE from 32 to 256 so the tris cover some tiles fully). https://bugs.freedesktop.org/show_bug.cgi?id=79421 Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-31 22:05:14 +02:00
Roland Scheidegger	c90b5884bd	llvmpipe: honor the render_condition_enable bit in blits. This fixes piglit nv_conditional_render-blitframebuffer. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-31 22:05:14 +02:00
Roland Scheidegger	1e9cbbb1c4	llvmpipe: do IR counting for shader cache management after optimization. `2ea923cf57` had the side effect of IR counting now being done after IR optimization instead of before. Some quick analysis shows that there's roughly 1.5 times more IR instructions before optimization than after, hence the effective shader cache size got quite a bit smaller. Could counter this with an increase of the instruction limit but it probably makes more sense to count them after optimizations, so move that code. Reviewed-by: Brian Paul <brianp@vmware.com>	2014-05-19 17:07:41 +02:00
Roland Scheidegger	26cac02c51	gallivm: give more verbose names to modules When we had just one module "gallivm" was an appropriate name. But now we have modules containing all functions for a particular variant, so give it a corresponding name (this is really just for helping debugging). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-16 22:50:14 +02:00
Roland Scheidegger	65ad90bd1b	llvmpipe: improve setup shader names (for debugging) The setup shaders were composed of both a fs shader number and a variant number. But since they aren't tied to a particular fragment shader, the former was a fixed zero while the latter was also always zero because it was never assigned. So, similar to what the fs code does, use a ever increasing number to give it a more catchy name (unlike fragment shaders though where this number is for each explicitly created shader, we just use it for the implicitly created variants). And while here, fix whitespace a bit. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-15 02:35:29 +02:00
Roland Scheidegger	1d28650b55	llvmpipe: kill off llvmpipe_variant_count Unused except it was increased for both fs and setup shader variants created. Probably some leftover from ages ago. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-15 02:35:26 +02:00
José Fonseca	0b239d9ed9	llvmpipe: Delete unneeded LLVM stuff earlier. Same as Frank's change to draw module but for llvmpipe module. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:05:00 +01:00
José Fonseca	9cf67e51b0	gallivm,draw,llvmpipe: Remove support for versions of LLVM prior to 3.1. Older versions haven't been tested probably don't work anyway. But more importantly, code supporting it is hindering further work. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:04:59 +01:00
Roland Scheidegger	cf93f86957	llvmpipe: change LP_MAX_SHADER_INSTRUCTIONS limit definition. When the limit was changed to be defined in terms of LP_MAX_SHADER_VARIANTS (`75f1fea14f`) when it was increased, this inadvertently lowered the limit in some branches (that have a lower LP_MAX_SHADER_VARIANTS number) when merged. So, make sure the limit is always at least the number it once was. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-08 16:26:49 +02:00
Ilia Mirkin	d95df4f4e4	gallium: add a cap for supporting 4-offset TG4 opcodes Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-07 20:40:46 -04:00
Ilia Mirkin	88d8d88d8c	gallium: add basic support for ARB_sample_shading Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-04-26 11:52:01 -04:00
Roland Scheidegger	a7a03d84fc	llvmpipe: fix clearing of individual color buffers in a fb GL (3.0) allows you to clear individual color buffers in a fb. In fact for fbs containing both int and float/normalized color buffers this is required (because the clearing values are otherwise undefined if applied to all buffers). The gallium interface was changed a while ago, but llvmpipe ignored it (hence doing such individual clears always resulted in clearing all buffers, plus some assorted asserts due to the mixed fbs). So change the clear command to indicate the buffer to be cleared. Also, because indicating the buffer to be cleared would have made lp_rast_arg_cmd larger which is unacceptable (we're trying to shrink it some day) allocate the clear value in the scene and just pass a pointer. There's several advantages and disadvantages here: + clearing individual buffers works (we could also actually bin such clears now if they'd come through clear_render_target() if the surface is in the current fb, though we didn't do this before for the single rb case and still don't try). + since there's one clear per rb, we do the format conversion in setup rather than per bin. Aside from the (drop in the ocean...) performance advantage this means that clearing to very small values (that is, denormal when converted to the format) should work for small float (fp16 etc.) formats, as the util code couldn't handle it correctly before (because cpu denorms are disabled when executing the bin commands, screwing up the magic conversion and flushing the values to 0, though this was not verified). - there's some overhead for traditional old-style clear-all MRT cases, since there's one rast clear command per rb instead of one for all rbs. This fixes https://bugs.freedesktop.org/show_bug.cgi?id=76976. v2: get rid of the ugly manual memcpy stuff and just use union util_color. This is 32 bytes instead of 16 but as the allocation is per scene we can live with those additional 16 bytes (and the additional 128 bytes in the setup context), which makes the code much more obvious. Suggested by Brian. Reviewed-by: Brian Paul <brianp@vmware.com>	2014-04-25 19:29:30 +02:00
Roland Scheidegger	2f65f61bea	llvmpipe: (trivial) use correct LP_MIN_VECTOR_ALIGN define for alignment. Currently it's the same value. Reviewed-by: Brian Paul <brianp@vmware.com>	2014-04-25 19:29:30 +02:00
José Fonseca	7380ce9bf6	llvmpipe: Advertise GLSL 3.30. According to Roland all TGSI support is there in theory. In practice there are a few piglit failures and crashes, as this hadn't been tested before. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-04-24 20:26:23 +01:00
Ilia Mirkin	c2f9ad5289	gallium: add a way to query min/max texture gather offsets Defaults to providing the same offsets as MIN/MAX_TEXEL_OFFSET. For nvc0, the offset can be -32/31. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-04-10 20:42:36 -04:00
Dave Airlie	be5276ae7d	gallium: add support for LODQ opcodes. This opcode provide support for GL_ARB_texture_query_lod, Signed-off-by: Dave Airlie <airlied@redhat.com> [imirkin: rebase, docs update] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-04-07 01:06:18 -04:00
Brian Paul	177c9be615	llvmpipe: remove no-op checks in sampler, sampler_view functions Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2014-04-03 20:05:56 -06:00
Dave Airlie	76ba50a25a	mesa/soft/llvmpipe: add fake MSAA support This adds a gallium cap that allows us to fake GL3.0 by not exposing MSAA on sw rendering. It also forces the extra extensions needed for GL3.2. Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-04-02 12:12:04 +10:00
Zack Rusin	bbdefabfc9	llvmpipe: Fix llvmpipe_create_gs_state. Revert unintended behaviour change from commit `b995a010e6`. Tested-by: José Fonseca <jfonseca@vmware.com>	2014-03-26 16:11:28 +00:00
José Fonseca	b995a010e6	llvmpipe: Simplify vertex and geometry shaders. Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader. Simplify lp_geometry_shader, as most of the incoming state is unneeded. (We could also just use draw_geometry_shader if we were willing to peek inside the structure.) Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2014-03-25 12:54:39 +00:00
Roland Scheidegger	9477d8c862	llvmpipe: add support for b5g6r5_srgb The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA (and vice versa), fix this to handle also 16bit 565-style srgb formats. Still not really all that generic, things like r10g10b10a2_srgb or r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not require more work to not crash but near certainly need some higher precision calculation) but not needed right now. The code is not fully optimized for this (could use more direct calculation instead of expanding to 8-bit range first) but should be good enough. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-03-21 17:23:38 +01:00
Richard Sandiford	f4b3430a36	llvmpipe: Tighten check for alpha-only formats The AoS version of ld_build_blend_factor was assuming that if the first channel was alpha, there were no rgb components. Fixes glean/blendFunc on System z. No piglit regressions on x86_64. The shortcut is still used in tests like spec/ARB_framebuffer_object/ fbo-alpha. Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>	2014-03-20 16:50:40 +01:00
Zack Rusin	dfa25ea5cd	gallium: allow setting of the internal stream output offset D3D10 allows setting of the internal offset of a buffer, which is in general only incremented via actual stream output writes. By allowing setting of the internal offset draw_auto is capable of rendering from buffers which have not been actually streamed out to. Our interface didn't allow. This change functionally shouldn't make any difference to OpenGL where instead of an append_bitmask you just get a real array where -1 means append (like in D3D) and 0 means do not append. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-03-07 12:49:33 -05:00
Marek Olšák	db8886ed09	gallium: the other drivers don't support ARB_buffer_storage Reviewed-by: Fredrik Höglund <fredrik@kde.org>	2014-02-25 16:07:33 +01:00
Dave Airlie	2fcbec48d7	gallium: add texture gather support to gallium (v3) This adds support to gallium for a TG4 instruction, and two CAPs. The first CAP is required for GL_ARB_texture_gather. The second CAP is required to expose GL_ARB_gpu_shader5. However so far we haven't found any hardware that natively exposes the textureGatherOffsets feature from GL, so just lower it for now. If hardware appears for this we can add another CAP to allow TG4 to take 4 offsets. v2: add component selection src and a cap to say hw can do it. (st can use to help control GL_ARB_gpu_shader5/GLSL 4.00). Add docs. v3: rename to SM5, add docs. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-02-25 13:29:17 +10:00
Grigori Goronzy	d34d5fddf8	gallium: add geometry shader output limits v2: adjust limits for radeonsi and llvmpipe v3: add documentation Cc: "10.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2014-02-09 23:31:38 +01:00
Marek Olšák	0354b769c2	gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERS This can be derived from the shader caps. All GPUs from ATI/AMD, NVIDIA, and INTEL have separate texture slots for each shader stage.	2014-02-04 20:19:16 +01:00
Roland Scheidegger	1d53603f1f	llvmpipe: fix denorm handling for r11g11b10_float format when blending The code re-enabling denorms for small float formats did not recognize this format due to format handling hacks (mainly, the lp_type doesn't have the floating bit set). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-01-31 19:51:06 +01:00
Siavash Eliasi	809d3a7d25	llvmpipe: Set PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT to 64 v2: Fixed setting switch cases prior to PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT incorrectly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-29 09:09:41 -07:00
Siavash Eliasi	6317664de0	llvmpipe: Use alignment of 64 instead of 16 for buffer allocation v2: Changed allocation alignment of llvmpipe_displaytarget_layout. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-29 09:09:41 -07:00
José Fonseca	fd33a6bcd7	gallium: Use C11 thread abstractions. Note that PIPE_ROUTINE now returns an int. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>	2014-01-23 12:55:55 +00:00
Marek Olšák	d382e90614	gallium: remove PIPE_CAP_SCALED_RESOLVE If any driver doesn't support this, it can use a blit after resolving the samples. Reviewed-by: Brian Paul <brianp@vmware.com>	2014-01-23 01:47:14 +01:00
Dave Airlie	2212a97fe3	llvmpipe: dump geometry shaders when using LP_DEBUG=tgsi for consistency with vs and fs dumpers. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-01-22 14:08:03 +10:00
José Fonseca	8771285054	s/Tungsten Graphics/VMware/ Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 \| xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra$ph\\|hp$ics,\? [iI]nc\.\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics$,\? [iI]nc\.$\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/alanh@tungstengraphics.com/alanh@vmware.com/ s/jens@tungstengraphics.com/jowen@vmware.com/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g s/keithw\?@tungstengraphics.com/keithw@vmware.com/g s/michel@tungstengraphics.com/daenzer@vmware.com/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/zack@tungstengraphics.com/zackr@vmware.com/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <brianp@vmware.com>	2014-01-17 20:00:32 +00:00
Brian Paul	d6fa71fbb0	llvmpipe: handle NULL color buffer pointers Fixes regression from `9baa45f78b` v2: incorporate a few small changes suggested by Roland. Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-01-17 08:52:11 -08:00
Roland Scheidegger	3b64714da4	llvmpipe: fix large point rasterization with point_quad_rasterization The whole round-pointsize-to-int stuff must only be done with GL legacy rules (no point_quad_rasterization) or all the wrong edges are lit up. This was previously in a private branch (d3d pointsprite test complains loudly otherwise) and got lost in a merge. However, it should certainly apply to GL point sprite rasterization as well. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-01-17 17:01:01 +01:00
Zack Rusin	93b953d139	llvmpipe: do constant buffer bounds checking in shaders It's possible to bind a smaller buffer as a constant buffer, than what the shader actually uses/requires. This could cause nasty crashes. This patch adds the architecture to pass the maximum allowable constant buffer index to the jit to let it make sure that the constant buffer indices are always within bounds. The behavior follows the d3d10 spec, which says the overflow should always return all zeros, and overflow is only defined as access beyond the size of the currently bound buffer. Accesses beyond the declared shader constant register size are not considered an overflow and expected to return garbage but consistent garbage (we follow the behavior which some wlk tests expect which is to return the actual values from the bound buffer). Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-01-16 16:33:57 -05:00
José Fonseca	9b96be595b	llvmpipe: Honour pipe_rasterizer::point_quad_rasterization. Commit `eda21d2a30` fixed the rasterization of points for Direct3D but ended up breaking the rasterization of OpenGL non-sprite points, in particular conform's pntrast.c test. The only way to get both working is to properly honour pipe_rasterizer::point_quad_rasterization, and follow the weird OpenGL rule when it is false. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-01-09 12:35:11 +00:00
José Fonseca	eda21d2a30	llvmpipe: Fix the bottom_edge_rule adjustment for points. The adjustment needs to be applied to the y coordinates and not the x coordinates, just like the equivalent code for lines and triangles in lp_setup_line.c and lp_setup_tri.c. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2014-01-08 12:18:17 +00:00
José Fonseca	37de6b0682	llvmpipe: Respect bottom_edge_rule when computing the rasterization bounding boxes. This was inadvertently forgotten when replacing gl_rasterization_rules with lower_left_origin and half_pixel_center (commit `2737abb44e`). This makes a difference when lower_left_origin != half_pixel_center, e.g, D3D10. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2014-01-08 12:18:17 +00:00
José Fonseca	2d368b982a	llvmpipe: Basic implementation of pipe_context::set_sample_mask. We don't support MSAA (ie, number of samples is always one) therefore sample_mask boils down to a synonym of the rasterizer_discard flag. Also, this change makes setup actually use the value received in lp_setup_set_rasterizer_discard instead of reaching out to llvmpipe upper layers to re-fetch it. Based on Si Chen's draft. With this patch `wgf11multisample Coverage passes 100%` on the UMD D3D10 state tracker. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Si Chen <sichen@vmware.com>	2014-01-07 16:04:42 +00:00
Si Chen	72c6d0e506	llvmpipe: Implement alpha_to_coverage for non-MSAA framebuffers. Implement Alpha to Coverage by discarding a fragment alpha component is less than 0.5. This is a joint work of Jose and Si. Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-01-07 16:04:42 +00:00
Jonathan Liu	7990ab58fa	llvmpipe: use pipe_sampler_view_release() to avoid segfault This fixes another case of faulting when freeing a pipe_sampler_view that belongs to a previously destroyed context. Cc: "10.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jonathan Liu <net147@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-12-22 07:07:56 -07:00
Roland Scheidegger	7c027666da	llvmpipe: get rid of barycentric calculation of a0 Didn't really work as well as hoped (in particular it was not generally more accurate), will solve this differently. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-12-14 17:11:03 +01:00
Roland Scheidegger	bfcf1ba1c4	llvmpipe: (trivial) get rid of triangle subdivision code This code was always problematic, and with 64bit rasterization we no longer need it at all. Reviewed-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-12-14 17:11:03 +01:00
Dave Airlie	ba00f2f6f5	swrast* (gallium, classic): add MESA_copy_sub_buffer support (v3) This patches add MESA_copy_sub_buffer support to the dri sw loader and then to gallium state tracker, llvmpipe, softpipe and other bits. It reuses the dri1 driver extension interface, and it updates the swrast loader interface for a new putimage which can take a stride. I've tested this with gnome-shell with a cogl hacked to reenable sub copies for llvmpipe and the one piglit test. I could probably split this patch up as well. v2: pass a pipe_box, to reduce the entrypoints, as per Jose's review, add to p_screen doc comments. v3: finish off winsys interfaces, add swrast classic support as well. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com> swrast: add support for copy_sub_buffer	2013-12-13 14:37:01 +10:00
Matthew McClure	e84a1ab3c4	llvmpipe: add plumbing for ARB_depth_clamp With this patch llvmpipe will adhere to the ARB_depth_clamp enabled state when clamping the fragment's zw value. To support this, the variant key now includes the depth_clamp state. key->depth_clamp is derived from pipe_rasterizer_state's (depth_clip == 0), thus depth clamp is only enabled when depth clip is disabled. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-12-11 18:24:21 +00:00
Zack Rusin	7a50d38a2b	llvmpipe: add a very useful (disabled) debugging output Disabled by default, but it's very useful when needed. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-12-10 16:41:11 -05:00
Zack Rusin	155139059b	llvmpipe: fix blending with half-float formats The fact that we flush denorms to zero breaks our half-float conversion and blending. This patches enables denorms for blending. It's a little tricky due to the llvm bug that makes it incorrectly reorder the mxcsr intrinsics: http://llvm.org/bugs/show_bug.cgi?id=6393 Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com>	2013-12-10 16:39:48 -05:00
Matthew McClure	0319ea9ff6	llvmpipe: clamp fragment shader depth write to the current viewport depth range. With this patch, generate_fs_loop will clamp any fragment shader depth writes to the viewport's min and max depth values. Viewport selection is determined by the geometry shader output for the viewport array index. If no index is specified, then the default viewport index is zero. Semantics for this path can be found in draw_clamp_viewport_idx and lp_clamp_viewport_idx. lp_jit_viewport was created to store viewport information visible to JIT code, and is validated when the LP_NEW_VIEWPORT dirty flag is set. lp_rast_shader_inputs is responsible for passing the viewport_index through the rasterizer stage to fragment stage (via lp_jit_thread_data). Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-12-09 12:57:02 +00:00
Marek Olšák	1a02bb71dd	gallium: add support for AMD_vertex_shader_layer	2013-12-03 19:39:13 +01:00
Roland Scheidegger	2983c039df	gallium: new shader cap bit for the amount of sampler views Ever since introducing separate sampler and sampler view max this was really missing. Every driver but llvmpipe reports the same number as number of samplers for now, so nothing should break. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-28 04:02:18 +01:00
Zack Rusin	0510ec67e2	llvmpipe: support 8bit subpixel precision 8 bit precision is required by d3d10 but unfortunately requires 64 bit rasterizer. This commit implements 64 bit rasterization with full support for 8bit subpixel precision. It's a combination of all individual commits from the llvmpipe-rast-64 branch. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-11-25 13:05:03 -05:00
Roland Scheidegger	f69d2c857d	llvmpipe: (trivial) disable new accurate origin calculation It looks like there's some bugs in it...	2013-11-22 11:29:00 +00:00
Roland Scheidegger	28d7b4147d	llvmpipe: calculate more accurate interpolation value at origin Some rounding errors could crop up when calculating a0. Use a more accurate method (barycentric interpolation essentially) to fix this, though to fix the REAL problem (which is that our interpolation will give very bad results with small triangles far away from the origin when they have steep gradients) this does absolutely nothing (actually makes it worse). (To fix the real problem, either would need to use a vertex corner (or some other point inside the tri) as starting point value instead of fb origin and pass that down to interpolation, or mimic what hw does, use barycentric interpolation (using the coordinates extracted from the rasterizer edge functions) - maybe another time.) Some (silly) tests though really want a high accuracy at fb origin and don't care much about anything else (Just. Don't. Ask.). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-21 20:39:19 +00:00
Emil Velikov	7dac1b470a	gallium/drivers: compact compiler flags into Automake.inc * minimise flags duplication * distingush between VISIBILITY C and CXX flags * set only required flags - C and/or CXX v2: add LLVM_CFLAGS back to AM_CFLAGS (add missing backslash) Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2013-11-16 16:29:28 +00:00
Roland Scheidegger	473cb3fe4a	llvmpipe: (trivial) fix more fallout from the setup cleanup. Oops... Should have done some more testing.	2013-11-14 15:49:42 +00:00
Roland Scheidegger	5190c16a04	llvmpipe: (trivial) fix misplaced bld context assignment. Should fix polygon offset crashes...	2013-11-14 14:44:15 +00:00
Roland Scheidegger	2dd693412a	llvmpipe: clean up state setup code a bit In particular get rid of home-grown vector helpers which didn't add much. And while here fix formatting a bit. No functional change. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-14 12:24:55 +00:00
Roland Scheidegger	754319490f	gallivm,llvmpipe: fix float->srgb conversion to handle NaNs d3d10 requires us to convert NaNs to zero for any float->int conversion. We don't really do that but mostly seems to work. In particular I suspect the very common float->unorm8 path only really passes because it relies on sse2 pack intrinsics which just happen to work by luck for NaNs (float->int conversion in hw gives integer indeterminate value, which just happens to be -0x80000000 hence gets converted to zero in the end after pack intrinsics). However, float->srgb didn't get so lucky, because we need to clamp before blending and clamping resulted in NaN behavior being undefined (and actually got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp with defined nan behavior as we can handle the NaN for free this way. I suspect there's more bugs lurking in this area (e.g. converting floats to snorm) as we don't really use defined NaN behavior everywhere but this seems to be good enough. While here respecify nan behavior modes a bit, in particular the return_second mode didn't really do what we wanted. From the caller's perspective, we really wanted to say we need the non-nan result, but we already know the second arg isn't a NaN. So we use this now instead, which means that cpu architectures which actually implement min/max by always returning non-nan (that is adhering to ieee754-2008 rules) don't need to bend over backwards for nothing. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-14 12:24:55 +00:00
Roland Scheidegger	50f19e3a66	draw,llvmpipe: use exponent manipulation instead of exp2 for polygon offset Since we explicitly require a integer input we should avoid using exp2 math (even if we were using optimized versions), which turns the exp2 into a int sub (plus some casts). v2: fix bogus uint (needs to be int) math spotted by Matthew, fix comments Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-12 19:08:58 +00:00
Matthew McClure	f9e2c24326	draw,llvmpipe,util: add depth bias calculation for arb_depth_buffer_float With this patch, the llvmpipe and draw modules will calculate the depth bias according to floating point depth buffer semantics described in the arb_depth_buffer_float specification, when the driver has a z buffer bound with a format type of UTIL_FORMAT_TYPE_FLOAT. By default, the driver will use the existing UNORM calculation for depth bias. A new function, draw_set_zs_format, was added to calculate the Minimum Resolvable Depth value and floating point depth sense for the draw module. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-11-07 18:32:54 +00:00
Roland Scheidegger	e4195acab5	llvmpipe: fix bogus layer clamping in setup The layer coming from GS needs to be clamped (not sure if that's actually the correct error behavior but we need something) as the number can be higher than the amount of layers in the fb. However, this code was using the layer calculation from the scene, and this was actually calculated in lp_scene_begin_rasterization() hence too late (so setup was using the value from the _previous_ scene or just zero if it was the first scene). Since the value is used in both rasterization and setup, move calculation up to lp_scene_begin_binning() though it's a bit more inconvenient to calculate there. (Theoretically could move _all_ code which was in lp_scene_begin_rasterization() to there, because ever since we got rid of swizzled render/depth buffers our "map" functions preparing the fb data for render don't actually change the data in there at all, but it feels like it would be a hack.) v2: improve comments Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-10-29 17:54:03 +01:00
Matthew McClure	be0b67a143	util,llvmpipe: correctly set the minimum representable depth value Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-10-29 15:53:48 +00:00
Ilia Mirkin	12d39b4fa8	gallium: add PIPE_CAP_MIXED_FRAMEBUFFER_SIZES This CAP will determine whether ARB_framebuffer_object can be enabled. The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf textures. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2013-10-26 01:36:07 +02:00
Brian Paul	a3ed98f7aa	gallium: new, unified pipe_context::set_sampler_views() function The new function replaces four old functions: set_fragment/vertex/ geometry/compute_sampler_views(). Note: at this time, it's expected that the 'start' parameter will always be zero. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Emil Velikov <emil.l.velikov@gmail.com>	2013-10-23 10:15:38 -06:00
Roland Scheidegger	ac81b6f2be	llvmpipe: enable seamless cube filtering Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-21 15:42:04 +02:00
José Fonseca	40ddd8b659	Revert "scons: Fix build when rtti is disabled" This reverts commit `94d05bf87a` as it has a few problems: - it breaks windows builds becuase env[LLVM_CXXFLAGS] is never set there - it is merging not only rtti, but the whole cxxflags (defines etc) which has proven to be a source of troubles (breaks debugging etc.)	2013-10-16 15:05:51 -07:00
Alexander von Gluck IV	94d05bf87a	scons: Fix build when rtti is disabled * The rtti fix actually dug up a bug in the scons build scripts. * Autotools took the LLVM cpp and cxx flags, while scons only took the cpp flags. * This grabs the cxx flags and applies them where needed. We may want to make the same change for the llvm cpp flags in scons. * The only linux platform I can find with LLVM no-rtti is Ubuntu. * Fixes bug #70471 Tested-by: Vinson Lee <vlee@freedesktop.org>	2013-10-15 22:12:18 -05:00
José Fonseca	85d7f6779f	llvmpipe: Advertise PIPE_CAP_DEPTH_CLIP_DISABLE. Actually implemented by draw module. Tested piglit ARB_depth_clamp tests, which pass 100%. Trivial.	2013-10-15 18:22:57 -07:00
Roland Scheidegger	75f1fea14f	llvmpipe: increase fs shader variant instruction cache limit by factor 4 The previous limit of of 128*1024 was reported to cause frequent recompiles in some apps due to shader variant thrashing on IRC in some apps leading to noticeable lags. Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less impossible to reach, since even simple fragment shaders without texturing (glxgears) used more than twice than 128 instructions, hence the instruction limit would have always been reached first (excluding things like trivial shaders not writing color). Even with the new limit it is VERY likely the instruction limit is hit first. Should help with such lags due to recompiles (though other shader types have their own limits, LP_MAX_SETUP_VARIANTS and DRAW_MAX_SHADER_VARIANTS, in particular the latter seems a bit small (128)). Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-12 04:05:57 +02:00
José Fonseca	1aef0ef277	llvmpipe: We don't use the draw pipeline for offset_point/line. Unless the polygon fill mode is different from PIPE_POLYGON_MODE_FILL, so checking the the polygon mode is sufficient. Testing done: no regression in polygon-mode-offset Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-10-09 21:09:07 -07:00
Zack Rusin	edde6c77bd	llvmpipe: abstract the code to set number of subpixel bits As we're moving towards expanding the number of subpixel bits and the width of the variables used in the computations we need to make this code a bit more centralized. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-09 18:30:31 -04:00
Marek Olšák	085e5adede	gallium/swrast: don't export any private symbols Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2013-10-08 16:23:52 +02:00
Brian Paul	2d0effaa10	llvmpipe: remove old bind_*_sampler_states() functions	2013-10-03 14:05:27 -06:00
Brian Paul	c772338488	llvmpipe: implement pipe_context::bind_sampler_states()	2013-10-03 14:05:26 -06:00
Emil Velikov	e369126709	llvmpipe: consolidate C sources list into Makefile.sources Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2013-10-01 07:29:49 -07:00
Vinson Lee	76df7edacf	llvmpipe: Remove unnecessary null check of shader. shader has already been dereferenced earlier so cannot be null here. Fixes "Dereference before null check" defect reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-09-30 22:00:54 -07:00
Zack Rusin	60c448faea	llvmpipe: count c_primitives before discarding null prims We need to count the clipper primitives before the rasterizer discards one it considers to be null. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-09-25 19:41:02 -04:00
Zack Rusin	1291e833e7	llvmpipe: we need to subdivide if fb is bigger in either direction We need to subdivide triangles if either of the dimensions is larger than the max edge length, not when both of them are larger. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-09-25 19:38:21 -04:00
Zack Rusin	71ecc2cf71	Revert "llvmpipe: increase number of subpixel bits to eight" This reverts commit `755c11dc5e`. We agreed that this is band-aid that's not very useful and the proper solution is to rewrite the rasterization algo so that it operates on 64 bit values. Signed-off-by: Zack Rusin <zackr@vmware.com>	2013-09-24 15:10:02 -04:00
Zack Rusin	e5ec5aef2b	llvmpipe: align the array used for subdivived vertices When subdiving a triangle we're using a temporary array to store the new coordinates for the subdivided triangles. Unfortunately the array used for that was not aligned properly causing random crashes in the llvm jit code which was trying to load vectors from it. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-09-23 18:10:51 -04:00
Zack Rusin	755c11dc5e	llvmpipe: increase number of subpixel bits to eight Unfortunately d3d10 requires a lot higher precision (e.g. wgf11clipping tests for it). The smallest number of precision bits with which it passes is 8. That means that we need to decrease the maximum length of an edge that we can handle without subdivision by 4 bits. Abstracted the code a bit to make it easier to change once to switch to 64bit rasterization. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-09-23 14:53:07 -04:00
Marek Olšák	419cd5f2a2	gallium: add flush_resource context function r600g needs explicit flushing before DRI2 buffers are presented on the screen. v2: add (stub) implementations for all drivers, fix frontbuffer flushing v3: fix galahad Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2013-09-20 20:35:55 +02:00
José Fonseca	1569b3e536	llvmpipe: Fix rendering to PIPE_FORMAT_R10G10B10A2_UNORM. We must take rounding in consideration when re-scaling to narrow normalized channels, such as 2-bit normalized alpha. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-09-20 17:34:57 +01:00
Roland Scheidegger	bd3909f265	draw: clean up setting stream out information a bit In particular noone is interested in the vertex count, so drop that, and also drop the duplicated num_primitives_generated / so.primitives_storage_needed variables in drivers. I am unable for now to figure out if primitives_storage_needed in SO stats (used for d3d10) should increase if SO is disabled, though the equivalent num_primitives_generated used for OpenGL definitely should increase. In any case we were only counting when SO is active both in softpipe and llvmpipe anyway so don't pretend there's an independent num_primitives_generated counter which would count always. (This means the PIPE_QUERY_PRIMITIVES_GENERATED count will still be wrong just as before, should eventually fix this by doing either separate counting for this query or adjust the code so it always counts this even if SO is inactive depending on what's correct for d3d10.) Reviewed-by: Brian Paul <brianp@vmware.com>	2013-08-27 16:59:39 +02:00
Roland Scheidegger	aff2ecf09a	llvmpipe: support nested/overlapping queries for all query types There's just no way resetting the counters is working with nested/overlapping queries. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-08-27 16:59:01 +02:00
Roland Scheidegger	ac1a2714c7	gallivm: implement better control of per-quad/per-element/scalar lod There's a new debug value used to disable per-quad lod optimizations in fragment shader (ignored for vs/gs as the results are just too wrong typically). Also trying to detect if a supplied lod value is really a scalar (if it's coming from immediate or constant file) in which case sampler code can use this to stay on per-quad-lod path (in fact for explicit lod could simplify even further and use same lod for both quads in the avx case but this is not implemented yet). Still need to actually implement per-element lod bias (and derivatives), and need to handle per-element lod in size queries. v2: fix comments, prettify. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Roland Scheidegger	abdd32dcd5	llvmpipe: fix stencil bug if we have both stencil and depth tests This is a very well hidden bug found by accident (only the fixed glean tstencil2 test so far seems to hit it). We must use new mask with combined s_pass values and orig_mask values for zpass/zfail stencil ops, otherwise both the sfail op and one of zpass/zfail op are applied (probably not hit in most tests because some of the ops tend to be KEEP usually). Note: this is a candidate for the 9.2 branch. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 17:30:07 +02:00
Zack Rusin	27cedd8aec	llvmpipe: fix pipeline statistics with a null ps If the fragment shader is null then pixel shader invocations have to be equal to zero. And if we're running a null ps then clipper invocations and primitives should be equal to zero but only if both stancil and depth testing are disabled. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-14 18:23:36 -04:00
Roland Scheidegger	894d4903e7	gallivm: set non-existing values really to zero in size queries for d3d10 My previous attempt at doing so double-failed miserably (minification of zero still gives one, and even if it would not the value was never written anyway). While here also rename the confusingly named int_vec bld as we have int vecs of different sizes, and rename need_nr_mips (as this also changes out-of-bounds behavior) to is_sviewinfo too. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-09 20:49:19 +02:00
Roland Scheidegger	b0f74250e1	gallivm: use texture target from shader instead of static state for size query d3d10 has no notion of distinct array resources neither at the resource nor sampler view level. However, shader dcl of resources certainly has, and d3d10 expects resinfo to return the values according to that - in particular a resource might have been a 1d texture with some array layers, then the sampler view might have only used 1 layer so it can be accessed both as 1d or 1d array texture (I think - the former definitely works). resinfo of a resource decleared as array needs to return number of array layers but non-array resource needs to return 0 (and not 1). Hence fix this by passing the target from the shader decl to emit_size_query and use that (in case of OpenGL the target will come from the instruction itself). Could probably do the same for actual sampling, though it may not matter there (as the bogus components will essentially get clamped away), possibly could wreak havoc though if it REALLY doesn't match (which is of course an error but still). Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-09 20:49:18 +02:00
Roland Scheidegger	eac57bc223	gallivm: propagate scalar_lod to emit_size_query too Clearly the returned values need to be per-element if the lod is per element. Does not actually change behavior yet. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Zack Rusin	12522041d6	draw: fix slot detection Nowadays -1 for slots means that the semantic is not present, so we need to store it in a signed variables, otherwise <0 comparisons are pointless. Fixes http://bugzilla.eng.vmware.com/show_bug.cgi?id=67811 (at least with softpipe, edgeflags don't work wit llvmpipe) Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-06 20:23:57 -04:00
Vinson Lee	b57c1e4b86	llvmpipe: Do not need to free anything if there is no geometry shader. If gs is null, then freeing state->shader.tokens would result in a null dereference. Fixes "Dereference after null check" defect reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-08-05 21:54:20 -07:00
Zack Rusin	95829e2029	llvmpipe: fix frontface behavior again Lets make sure the frontface is 1 for front and -1 for back. Discussed with Roland and Jose. Signed-off-by: Zack Rusin <zackr@vmware.com>	2013-08-02 22:21:29 -04:00
Zack Rusin	bff0d87668	llvmpipe: don't interpolate front face or prim id The loop was iterating over all the fs inputs and setting them to perspective interpolation, then after the loop we were creating extra output slots with the correct interpolation. Instead of injecting bogus extra outputs, just set the interpolation on front face and prim id correctly when doing the initial scan of fs inputs. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-08-02 20:12:53 -04:00
Zack Rusin	d6b3a193d4	draw: inject frontface info into wireframe outputs Draw module can decompose primitives into wireframe models, which is a fancy word for 'lines', unfortunately that decomposition means that we weren't able to preserve the original front-face info which could be derived from the original primitives (lines don't have a 'face'). To fix it allow draw module to inject a fake face semantic into outputs from which the backends can figure out the original frontfacing info of the primitives. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-02 20:11:18 -04:00
Zack Rusin	2d15f4746b	llvmpipe: make the front-face behavior match the gallium spec The spec says that front-face is true if the value is >0 and false if it's <0. To make sure that we follow the spec, lets just subtract 0.5 from our value (llvmpipe did 1 for frontface and 0 otherwise), which will get us a positive num for frontface and negative for backface. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-02 15:50:16 -04:00
Tom Stellard	4e90bc9a12	gallium: Add PIPE_CAP_ENDIANNESS Cc: mesa-stable@lists.freedesktop.org [ Francisco Jerez: Fix "PIPE_ENDIAN_SMALL" in the documentation, define PIPE_ENDIAN_NATIVE. ]	2013-07-22 22:43:17 +02:00
Zack Rusin	7bae56c5c2	llvmpipe: Ensure FTZ/DAZ flags are set on deferred draw flushes. Tested-by: José Fonseca <jfonseca@vmware.com>	2013-07-22 18:11:39 +01:00
José Fonseca	2a650611be	llvmpipe: Remove lp_rast_get_num_threads(). Never called. Trivial.	2013-07-22 18:08:39 +01:00
Zack Rusin	f59cb67376	llvmpipe/tests: update arith test to check for edge cases Test infs, zeros and nans with our arith functions to assure correct/defined behavior with those values. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-19 16:29:18 -04:00
Roland Scheidegger	4ef19f7fec	llvmpipe: clamp inputs for srgb render buffers Usually with fixed point renderbuffers clamping is done as part of conversion. However, since we blend in float format, we essentially skip all conversion steps pre-blend but since this is still a fixed point renderbuffer we must still clamp the inputs in this case. Makes no difference for piglit though. Obviously we could skip this if fragment color clamping is enabled, but a) this is deprecated in OpenGL (d3d never had it) and b) we don't support it natively so it gets baked into the shader. Also add some comment about logic ops being broken for srgb, luckily no test tries to do that as there's no easy fix... Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-07-18 19:04:20 +02:00
Roland Scheidegger	e57b98bad3	llvmpipe: fix blending with SRC_ALPHA_SATURATE with some formats without alpha We were fixing up the blend factor to ZERO, however this only works correctly with fixed point render buffers where the input values are clamped to 0/1 (because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped inputs). Haven't seen any failure anywhere due to that with fixed point SNORM buffers (which clamp inputs to -1/1) but it should apply there as well (snorm blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all, d3d10 requires them but they are not blendable). Doesn't look like piglit hits this though (some internal testing hits the float case at least). (With legacy OpenGL we could theoretically still use the fixup to zero if the fragment color clamp is enabled, but we can't detect that easily since we don't support native clamping hence it gets baked into the shader.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-07-18 19:03:35 +02:00
Roland Scheidegger	dc1cc928ed	llvmpipe: support sRGB framebuffers Just use the new conversion functions to do the work. The way it's plugged in into the blend code is quite hacktastic but follows all the same hacks as used by packed float format already. Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never worked anyway in the blend code and are thus disabled, and I don't think anyone is interested in L8/L8A8. Would need even more hacks otherwise. Unless I'm missing something, this is the last feature except MSAA needed for OpenGL 3.0, and for OpenGL 3.1 as well I believe. v2: prettify a bit, use separate function for packing. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-16 01:54:51 +02:00
Zack Rusin	63386b2f66	util: treat denorm'ed floats like zero The D3D10 spec is very explicit about treatment of denorm floats and the behavior is exactly the same for them as it would be for -0 or +0. This makes our shading code match that behavior, since OpenGL doesn't care and on a few cpu's it's faster (worst case the same). Float16 conversions will likely break but we'll fix them in a follow up commit. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-09 23:30:55 -04:00
Roland Scheidegger	f3bbf65929	gallivm: do per-pixel lod calculations for explicit lod d3d10 requires per-pixel lod calculations for explicit lod, lod bias and explicit derivatives, and we should probably do it for OpenGL too - at least if they are used from vertex or geometry shaders (so doesn't apply to lod bias) this doesn't just affect neighboring pixels. Some code was already there to handle this so fix it up and enable it. There will no doubt be a performance hit unfortunately, we could do better if we'd knew we had a real vector shift instruction (with variable shift count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu). Don't do anything for lod bias and explicit derivatives yet, though no special magic should be needed for them neither. Likewise, the size query is still broken just the same. v2: Use information if lod is a (broadcast) scalar or not. The idea would be to base this on the actual value, for now just pretend it's a scalar in fs and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same code is generated for fs as before). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-04 19:42:04 +02:00
Marek Olšák	30c3e8718d	mesa,glsl,gallium: remove GLSLSkipStrictMaxVaryingLimitCheck and dependencies Not needed with do_dead_builtin_varyings. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-07-02 17:02:14 +02:00
Roland Scheidegger	7d430bfab9	llvmpipe: fix timer query if there's no bins `b04a295a4a` removed seemingly unnecessary code in get_query. Turns out this code could in fact be reached - while timestamps are always binned, if there are no bins (which happens if fb size is 0) then the rasterization query code filling this in is still never executed. So fix this up by filling in some timestamp, but do it at EndQuery time not GetQuery time which should be more appropriate. Makes piglit arb_timer_query-timestamp-get happy again. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-29 16:58:02 +02:00
Roland Scheidegger	670f829102	llvmpipe: handle offset_clamp This was just ignored (unless for some reason like unfilled polys draw was handling this). I'm not convinced of that code, putting the float for the clamp in the key isn't really a good idea. Then again the other floats for depth bias are already in there too anyway (should probably have a jit_context for the setup function), so this is just a quick fix. Also, the "minimum resolvable depth difference" used isn't really right as it should be calculated according to the z values of the current primitive and not be a constant (of course, this only makes a difference for float depth buffers), at least for d3d10, so depth biasing is still not quite right. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-27 19:06:40 +02:00
Roland Scheidegger	b04a295a4a	llvmpipe: remove never reached code for timestamp queries. timestamp queries are always binned in an active scene, therefore always have a result.	2013-06-27 19:06:40 +02:00
Roland Scheidegger	59b8689d37	llvmpipe: fix a bug in opaque optimization If there are queries active the opaque optimization reseting the bin needs to be disabled. (Not really tested since the bug was discovered by code inspection not an actual test failure.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-27 19:06:40 +02:00
Roland Scheidegger	2e4da1f594	llvmpipe: add support for nested / overlapping queries OpenGL doesn't support this but d3d10 does. It is a bit of a pain as it is necessary to keep track of queries still active at the end of a scene, which is also why I cheat a bit and limit the amount of simultaneously active queries to (arbitrary) 16 (simplifies things because don't have to deal with a real list that way). I can't think of a reason why you'd really want large numbers of overlapping/nested queries so it is hopefully fine. (This only affects queries which need to be binned.) v2: don't copy remainder of array when deleting an entry simply replace the deleted entry with the last one (order doesn't matter). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-26 23:17:53 +02:00
Roland Scheidegger	0820342880	llvmpipe: rework query logic Previously lp_rast_begin_query commands were always inserted into each bin, and re-issued if the scene was restarted, while lp_rast_end_query commands were executed for each still active query at the end of tile rasterization. Also, the ps_invocations and vis_counter were set to zero when the respective command was encountered. This however cannot work for multiple queries of the same type (note that occlusion counter and occlusion predicate while different type were also affected). So, change the logic to always set the ps_invocations and vis_counter to zero at the start of tile rasterization, and then use "start" and "end" per-thread query values when encountering the begin/end query commands instead, which should work for multiple queries of the same type. This also means queries do not have to be reissued in a new scene, however they still need to be finished at end of tile rasterization, so a list of queries still active at the end of a scene needs to be maintained. Also while here don't bin the queries which don't do anything in rasterization. (This change does not actually handle multiple queries of the same type yet, as the list of active queries is just a simple fixed array and setup can still only have one query active per type.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-26 23:17:53 +02:00
Adam Jackson	2151d893fb	gallium: Fix llvmpipe on big-endian machines Squashed commit of the following: commit 0857a7e105bfcbc4d1431b2cc56612094c747ca3 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:07 2013 -0400 gallivm: Fix lp_build_rgba8_to_fi32_soa for big endian Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 0d65131649a8aa140e2db228ba779d685c4333e3 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:07 2013 -0400 gallivm: Fix big-endian machines This adds a bit-shift count to the format table, and adds the concept of vector or bitwise alignment on gathers. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 9740bda9b7dc894b629ed38be9b51059ce90818f Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:07 2013 -0400 llvmpipe: Fix convert_to_blend_type on big-endian Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit ae037c2de0f029e4e99371c0de25560484f0d8df Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 util: Convert color pack to packed formats This fixes them on big-endian. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 5b05ac0c89ae092ea8ba5bba9f739708d7396b5c Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 graw-xlib: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 51396e7d098cb6ff794391cf11afe4dbf86dbea0 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 format: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 417b60bc66eb450e68a92ab0e47f76e292b385e6 Author: Adam Jackson <ajax@redhat.com> Date: Tue Jun 18 12:25:06 2013 -0400 st/dri: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 0934b2e022a5e0847d312c40734e2b44cac52fd8 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 st/xlib: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit a307ea3c3716a706963acce7966b5e405ba11db9 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 gbm: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 53eebdd253e1960a645ea278f31d7ef6a6cf4aeb Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 tests: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 2f77fe3ee524945eacd546efcac34f7799fb3124 Author: Adam Jackson <ajax@redhat.com> Date: Tue Jun 18 13:07:37 2013 -0400 gallium: Document packed formats Signed-off-by: Adam Jackson <ajax@redhat.com> commit 1f1017159ce951f922210a430de9229f91f62714 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 gallium: Introduce 32-bit packed format names These are for interacting with buffers natively described in terms of bit shifts, like X11 visuals: uint32_t xyzw8888 = (x << 0) \| (y << 8) \| (z << 16) \| (w << 24); Define these in terms of (endian-dependent) aliases to the array-style format names. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 6cc7ab1ee66ed668da78c1d951dfd7782b4e786a Author: Adam Jackson <ajax@redhat.com> Date: Mon Jun 3 12:10:32 2013 -0400 gallium: Document format name conventions v2: - Fix a channel name thinko (Michel Dänzer) - Elaborate on SCALED versus INT - Add links to DirectX and FOURCC docs Signed-off-by: Adam Jackson <ajax@redhat.com> commit df4d269e7fb62051a3c029b84147465001e5776e Author: Adam Jackson <ajax@redhat.com> Date: Tue Jun 18 12:25:06 2013 -0400 gallivm: Remove all notion of byte-swapping Signed-off-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2013-06-24 09:48:56 -04:00
Roland Scheidegger	d282f4ea9b	llvmpipe: fix wrong results for queries not in a scene The result isn't always 0 in this case (depends on query type), so instead of special casing this just use the ordinary path (should result in correct values thanks to initialization in query_begin/end), just skipping the fence wait. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-22 17:09:37 +02:00
Roland Scheidegger	5c9aee111e	llvmpipe: use 64bit counter for occlusion queries Some APIs require 64bit and at least for 64bit archs the overhead should be minimal. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-19 23:47:36 +02:00
Roland Scheidegger	dc5dc4fd94	llvmpipe: handle more queries Handle PIPE_QUERY_GPU_FINISHED and PIPE_QUERY_TIMESTAMP_DISJOINT, and also fill out the ps_invocations and c_primitives from the PIPE_QUERY_PIPELINE_STATISTICS (the others in there should already be handled). Note that ps_invocations isn't pixel exact, just 16 pixel exact but I guess it's better than nothing. Doesn't really seem to work correctly but there's probably bugs elsewhere. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-19 23:47:36 +02:00
Zack Rusin	9542131b27	Revert "draw: clear the draw buffers in draw" This reverts commit `41966fdb3b`. While it's a lot cleaner it causes regressions because the draw interface is always called from the draw functions of the drivers (because the buffers need to be mapped) which means that the stream output buffers endup being cleared on every draw rather than on setting. Signed-off-by: Zack Rusin <zackr@vmware.com>	2013-06-17 21:43:10 -04:00
Roland Scheidegger	8975dc798d	llvmpipe: fixes for conditional rendering honor render_condition for clear_render_target and clear_depth_stencil. Also add minimal support for occlusion predicate, though it can't be active at the same time as an occlusion query yet. While here also switchify some large if-else (actually just mutually exclusive if-if-if...) constructs. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-18 18:01:24 +02:00
Roland Scheidegger	793e8e3d7e	gallium: add condition parameter to render_condition For conditional rendering this makes it possible to skip rendering if either the predicate is true or false, as supported by d3d10 (in fact previously it was sort of implied skip rendering if predicate is false for occlusion predicate, and true for so_overflow predicate). There's no cap bit for this as presumably all drivers could do it trivially (but this patch does not implement it for the drivers using true hw predicates, nvxx, r600, radeonsi, no change is expected for OpenGL functionality). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-18 18:01:24 +02:00
Zack Rusin	41966fdb3b	draw: clear the draw buffers in draw Moves clearing of the draw so target buffers to the draw module. They had to be cleared in the drivers before which was quite messy. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-06-17 11:06:39 -04:00
Roland Scheidegger	4cce4efaa3	util: new util_fill_box helper Use new util_fill_box helper for util_clear_render_target. (Also fix off-by-one map error.) v2: handle non-zero z correctly in new helper Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-13 00:41:43 +02:00
Roland Scheidegger	201d7a352b	llvmpipe: move create_surface/destroy_surface functions to lp_surface.c Believe it or not but these two are actually the first two functions which really belong in this file nowadays. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-06-07 21:15:01 +02:00
Roland Scheidegger	d8146f240e	llvmpipe: add support for layered rendering Mostly just make sure the layer parameter gets passed through to the right places (and get clamped, can do this at setup time), fix up clears to clear all layers and disable opaque optimization. Luckily don't need to touch the jitted code. (Clears invoked via pipe's clear_render_target method will not work however since the pipe_util_clear function used for it doesn't handle clearing multiple layers yet.) v2: per Brian's suggestion, prettify var initialization and add some comments, add assertion for impossible layer specification for surface. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-07 21:15:01 +02:00
Roland Scheidegger	d0518c4c69	llvmpipe: bump 3d and cube map limits to 2048 and 8192 respectively These should just work, required by d3d10. Too large resources will get thrown out separately anyway. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-06-06 23:51:38 +02:00
Roland Scheidegger	008fd03600	llvmpipe: improve alignment calculation for fetching/storing pixels This was always doing per-pixel alignment which isn't necessary, except for the buffer case (due to the per-element offset). The disabled code for calculating it was incorrect because it assumed that always the full block would be fetched, which may not be the case, so fix this up. The original code failed for instance for r10g10b10a2 the alignment would have been calculated as 4 (block_width) * 4 (bytes) so 16, but the actual fetch may have only fetched 2 values at a time, hence only alignment 8 - it is unclear what exactly would happen in this case (alignment larger than size to fetch). So just use the (already calculated) fetch size instead and get alignment from that which should always work, no matter if fetching 1,2 or 4 pixels. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-05 00:29:47 +02:00
Roland Scheidegger	ffe2a1ca3c	llvmpipe: reduce alignment requirement for 1d resources from 4x4 to 4x1 For rendering to buffers, we cannot have any y alignment. So make sure that tile clear commands only clear up to the fb width/height, not more (do this for all resources actually as clearing more seems pointless for other resources too). For the jit fs function, skip execution of the lower half of the fragment shader for the 4x4 stamp completely, for depth/stencil only load/store the values from the first row (replace other row with undef). For the blend function, also only load half the values from fs output, replace the rest with undefs so that everything still operates on the full 4x4 block to keep code the same between 4x1 and 4x4 (except for load/store of course which also needs to skip (store) or replace these values with undefs (load))., at the cost of slightly less optimal code being produced in some cases. Also reduce 1d and 1d array alignment too, because they can be handled the same as buffers so don't need to waste memory. v2: don't try to run special blend code for 4x1, (very) slightly less complexity if we just use the same code as for 4x4 which may or may not make it easier to optimize in the future (as we care a lot more about 4x4 performance than 1d). v2: don't use undef values for unused fs src outputs with llvm 3.1 as it apparently can trigger a bug in llvm. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-05 00:29:47 +02:00
Roland Scheidegger	ef3e887084	llvmpipe: cleanup of generate_unswizzled_blend Some parameters were used inconsistently, for instance not using block_width/block_height/block_size for deferring number of pixels but rather relying on guesses from the number of fragment shaders etc, so fix this up (no actual change in behavior since the block size stays fixed). (Though most of the code would work with different block_height, with three exceptions, one being the hacked r11g11b10 conversions and twiddle code which only work with block_height 2 not 1, and the last one being blend vector type not being 128bit wide.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-05 00:29:47 +02:00
Roland Scheidegger	44993c1808	gallivm: enhance special sse2 4x4f and 2x8f -> 1x16ub conversion There's no good reason why it can't handle 2x4f->1x8ub, 1x4f->1x4ub and 1x8f->1x8ub cases, there might be legitimate reasons why we don't have enough input vectors for a full destination vector, and using pack intrinsics should still be much better than using generic conversion (it looks like convert_alpha from the blend code might hit this though I suspect it could be avoided). v2: add another test vector format to lp_test_conv so this gets tested. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-05 00:29:46 +02:00
Roland Scheidegger	f51fc7a71c	llvmpipe: fix bogus assertions for buffer surfaces One of the assertion made no sense for buffer rendertargets (due to the union), so drop it. (The same assertion is present already in the path for texture surfaces later.). v2: make assertion completely accurate (suggested by Jose). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-01 20:03:59 +02:00
Roland Scheidegger	869c5d438f	llvmpipe: reduce alignment requirement for resources from 64x64 to 4x4 The overallocation was very bad especially for things like 1d array textures which got blown up by a factor of 64. (Even ordinary smallish 2d textures benefit a lot from this, a mipmapped 64x64 rgba8 texture previously used 7*16kB = 112kB instead of now ~22kB.) 4x4 is chosen because this is the size the jit functions run on, so making it smaller is going to be a bit more complicated. It is actually not strictly 4x4 pixel, since we'd want to avoid situations where different threads are rendering to the same cacheline so we keep cacheline size alignment in x direction (often 64bytes). To make this work introduce new task width/height parameters and make sure clears don't clear the whole tile if it's a partial tile. Likewise, the rasterizer may produce fragments outside the 4x4 blocks present in a tile, so don't call the jit function for them. This does not yet fix rendering to buffers (which cannot have any y alignment at all), and 1d/1d array textures are still overallocated by a factor of 4. v2: replace magic number 4 with LP_RASTER_BLOCK_SIZE, fix size of buffers allocated (needed in case we render to them). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-05-31 20:21:05 +02:00
Adam Jackson	e881c9a5dc	llvmpipe: Remove x/y from cmd_bin These were mostly just a waste of memory and cache pressure, and were really only used for debugging. This change reduces instruction count (as measured by callgrind's Ir event) of gnome-shell-perf-tool on Ivybridge by 3.5% ± 0.015% (n=20). Signed-off-by: Adam Jackson <ajax@redhat.com>	2013-05-31 20:21:05 +02:00
Zack Rusin	c88ce3480c	llvmpipe: clamp scissors to be between 0 and max We need to clamp to make sure invalid shader doesn't crash our driver. The spec says to return 0-th index for everything that's out of bounds. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca<jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-05-25 09:49:20 -04:00
Zack Rusin	4b5595b38b	draw: fixup draw_find_shader_output draw_find_shader_output like most of the code in draw used to depend on position always being at output slot 0. which meant that any other attribute being at 0 could signify an error. unfortunately position can be at any of the output slots, thus other attributes can occupy slot 0 and we need to mark the ones which were not found by something else. This commit changes draw_find_shader_output so that it returns -1 if it can't find the given attribute and adjust the code that depended on it returning >0 whenever it correctly found an attrib. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca<jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-05-25 09:49:20 -04:00
Zack Rusin	97b8ae429e	llvmpipe: implement support for multiple viewports Largely related to making sure the rasterizer can correctly pick out the correct scissor box for the current viewport. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca<jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-05-25 09:49:20 -04:00
Zack Rusin	eaabb4ead0	gallium: Add support for multiple viewports Gallium supported only a single viewport/scissor combination. This commit changes the interface to allow us to add support for multiple viewports/scissors. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: José Fonseca<jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-05-25 09:49:20 -04:00
Roland Scheidegger	33fcce3682	llvmpipe: get rid of tiled/linear layout remains Eliminate the rest of the no longer needed layout logic. (It is possible some code could be simplified a bit further still.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-05-29 00:41:06 +02:00
Roland Scheidegger	80e2cc0f97	llvmpipe: disable simple_shader optimization This optimization disabled mask checks if the shader is simple enough. While this should work correctly, the problem is that it can hide real issues because shaders in practice are usually complex enough (8 instructions or 1 texture is already enough) so this doesn't get used, whereas dumbed-down tests which should hit all the same code paths suddenly do something quite different. This was the reason that bug 41787 could not be easily tracked as stencil test not working correctly (piglit would in fact have failed some tests without that optimization). So disable it for now, it's unclear if it's much of a win in any case. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-05-22 22:57:27 +02:00
Roland Scheidegger	e108716429	llvmpipe: fix early depth test / late depth write stencil issues We actually did early depth/stencil test and late depth/stencil write even when the shader could kill the fragment (alpha test or discard). Since it matters for the new stencil value if the fragment is killed by depth/stencil test or by the shader (in which case it will not reach the depth/stencil test) this simply cannot work (we also would possibly skip writing the new stencil value due to mask checks but this is a secondary issue). So use late depth test / late depth write instead in this case. (No piglit changes as it doesn't seem to hit such bogus early depth test / late depth write path.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-05-22 22:57:27 +02:00
Roland Scheidegger	82d7733b52	llvmpipe: fix issue with not writing new stencil values We did mask checks between depth/stencil testing and depth/stencil write. This meant that if the depth/stencil test killed off all fragments we never actually wrote the new stencil value. This issue affected all early/late test/write combinations. So move the mask check after depth/stencil write (for early depth test, could do the same for late depth test but might not be worth it at that point so just skip it there). This addresses https://bugs.freedesktop.org/show_bug.cgi?id=41787. Piglit does not hit this issue because of the simple_shader optimization in generate_fs_loop() which means we're skipping the mask checks. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-05-22 22:57:27 +02:00
Roland Scheidegger	3c91ef0f29	llvmpipe: (trivial) remove confusing code in stencil test This was meant to disable some code which isn't needed when depth/stencil isn't written. However, there's more code which wouldn't be needed in that case so having the condition there was just odd (llvm will drop all the code anyway). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-05-22 22:57:27 +02:00
Roland Scheidegger	5314f5d829	llvmpipe: fix bug in early depth test / late depth write handling Using wrong type if the format was less than 32bits. No piglit changes as it doesn't hit that path. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-05-22 22:57:27 +02:00

1 2 3 4 5 ...

1849 Commits