Commit Graph

1849 Commits

Author SHA1 Message Date
Ilia Mirkin 746e5260f6 gallium: add a cap for max vertex streams
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-07-01 11:34:35 -04:00
Ilia Mirkin 43e4b3e311 gallium: add an index argument to create_query
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-07-01 11:34:31 -04:00
Takashi Iwai 6b8b17153a llvmpipe: Fix zero-division in llvmpipe_texture_layout()
Fix the crash of "gnome-control-center info" invocation on QEMU where
zero height is passed at init.

(sroland: simplify logic by eliminating the div altogether, using 64bit mul.)

Fixes: https://bugzilla.novell.com/show_bug.cgi?id=879462

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
2014-06-25 02:15:49 +02:00
Roland Scheidegger 2ea8e2fccf llvmpipe: increase number of queries which can be binned simultaneously to 64
Gallium (but not OpenGL) does allow nesting of queries, but there's no
limit specified (d3d10 has no limit neither). Nevertheless, for practical
purposes we need some limit in llvmpipe, otherwise we'd need more complex
handling of queries as we need to keep track of all binned queries (this
only affects queries which gather data past setup). A limit of 16 is too
small though, while 64 would suffice.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-06-13 20:08:39 +02:00
Christoph Bumiller 4b586a26c8 gallium: create TGSI_PROPERTY to disable viewport and clipping
Marek v2: add a cap

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2014-06-02 12:49:03 +02:00
Roland Scheidegger 3fc72f2ec6 llvmpipe: (trivial) drop "unswizzled" from some function names
This made sense when swizzled storage layout was used for rendering to tiles.
But nowadays the name just adds confusion (and makes for long lines).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-31 22:05:14 +02:00
Roland Scheidegger 576868140b llvmpipe: fix crash when not all attachments are populated in a fb
Framebuffers can have NULL attachments since a while. llvmpipe handled
that properly for lp_rast_shade_quads_mask but it seems the change didn't
make it to lp_rast_shade_tile.
This fixes piglit fbo-drawbuffers-none test (though I need to increase
the FB_SIZE from 32 to 256 so the tris cover some tiles fully).
https://bugs.freedesktop.org/show_bug.cgi?id=79421

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-31 22:05:14 +02:00
Roland Scheidegger c90b5884bd llvmpipe: honor the render_condition_enable bit in blits.
This fixes piglit nv_conditional_render-blitframebuffer.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-31 22:05:14 +02:00
Roland Scheidegger 1e9cbbb1c4 llvmpipe: do IR counting for shader cache management after optimization.
2ea923cf57 had the side effect of IR counting
now being done after IR optimization instead of before. Some quick analysis
shows that there's roughly 1.5 times more IR instructions before optimization
than after, hence the effective shader cache size got quite a bit smaller.
Could counter this with an increase of the instruction limit but it probably
makes more sense to count them after optimizations, so move that code.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-05-19 17:07:41 +02:00
Roland Scheidegger 26cac02c51 gallivm: give more verbose names to modules
When we had just one module "gallivm" was an appropriate name. But now we have
modules containing all functions for a particular variant, so give it a
corresponding name (this is really just for helping debugging).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 22:50:14 +02:00
Roland Scheidegger 65ad90bd1b llvmpipe: improve setup shader names (for debugging)
The setup shaders were composed of both a fs shader number and a variant
number. But since they aren't tied to a particular fragment shader, the
former was a fixed zero while the latter was also always zero because
it was never assigned. So, similar to what the fs code does, use a ever
increasing number to give it a more catchy name (unlike fragment shaders
though where this number is for each explicitly created shader, we just use
it for the implicitly created variants).
And while here, fix whitespace a bit.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-15 02:35:29 +02:00
Roland Scheidegger 1d28650b55 llvmpipe: kill off llvmpipe_variant_count
Unused except it was increased for both fs and setup shader variants created.
Probably some leftover from ages ago.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-15 02:35:26 +02:00
José Fonseca 0b239d9ed9 llvmpipe: Delete unneeded LLVM stuff earlier.
Same as Frank's change to draw module but for llvmpipe module.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
José Fonseca 9cf67e51b0 gallivm,draw,llvmpipe: Remove support for versions of LLVM prior to 3.1.
Older versions haven't been tested probably don't work anyway.  But more
importantly, code supporting it is hindering further work.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:04:59 +01:00
Roland Scheidegger cf93f86957 llvmpipe: change LP_MAX_SHADER_INSTRUCTIONS limit definition.
When the limit was changed to be defined in terms of LP_MAX_SHADER_VARIANTS
(75f1fea14f) when it was increased, this
inadvertently lowered the limit in some branches (that have a lower
LP_MAX_SHADER_VARIANTS number) when merged. So, make sure the limit is always
at least the number it once was.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-08 16:26:49 +02:00
Ilia Mirkin d95df4f4e4 gallium: add a cap for supporting 4-offset TG4 opcodes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-07 20:40:46 -04:00
Ilia Mirkin 88d8d88d8c gallium: add basic support for ARB_sample_shading
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-04-26 11:52:01 -04:00
Roland Scheidegger a7a03d84fc llvmpipe: fix clearing of individual color buffers in a fb
GL (3.0) allows you to clear individual color buffers in a fb. In fact
for fbs containing both int and float/normalized color buffers this is
required (because the clearing values are otherwise undefined if applied
to all buffers). The gallium interface was changed a while ago, but llvmpipe
ignored it (hence doing such individual clears always resulted in clearing
all buffers, plus some assorted asserts due to the mixed fbs).
So change the clear command to indicate the buffer to be cleared. Also, because
indicating the buffer to be cleared would have made lp_rast_arg_cmd larger
which is unacceptable (we're trying to shrink it some day) allocate the clear
value in the scene and just pass a pointer.
There's several advantages and disadvantages here:
+ clearing individual buffers works (we could also actually bin such clears now
if they'd come through clear_render_target() if the surface is in the current
fb, though we didn't do this before for the single rb case and still don't try).
+ since there's one clear per rb, we do the format conversion in setup rather
than per bin. Aside from the (drop in the ocean...) performance advantage this
means that clearing to very small values (that is, denormal when converted to
the format) should work for small float (fp16 etc.) formats, as the util code
couldn't handle it correctly before (because cpu denorms are disabled when
executing the bin commands, screwing up the magic conversion and flushing
the values to 0, though this was not verified).
- there's some overhead for traditional old-style clear-all MRT cases, since
there's one rast clear command per rb instead of one for all rbs.

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=76976.

v2: get rid of the ugly manual memcpy stuff and just use union util_color.
This is 32 bytes instead of 16 but as the allocation is per scene we can live
with those additional 16 bytes (and the additional 128 bytes in the setup
context), which makes the code much more obvious. Suggested by Brian.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-04-25 19:29:30 +02:00
Roland Scheidegger 2f65f61bea llvmpipe: (trivial) use correct LP_MIN_VECTOR_ALIGN define for alignment.
Currently it's the same value.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-04-25 19:29:30 +02:00
José Fonseca 7380ce9bf6 llvmpipe: Advertise GLSL 3.30.
According to Roland all TGSI support is there in theory.

In practice there are a few piglit failures and crashes, as this hadn't
been tested before.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-04-24 20:26:23 +01:00
Ilia Mirkin c2f9ad5289 gallium: add a way to query min/max texture gather offsets
Defaults to providing the same offsets as MIN/MAX_TEXEL_OFFSET. For
nvc0, the offset can be -32/31.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-04-10 20:42:36 -04:00
Dave Airlie be5276ae7d gallium: add support for LODQ opcodes.
This opcode provide support for GL_ARB_texture_query_lod,

Signed-off-by: Dave Airlie <airlied@redhat.com>
[imirkin: rebase, docs update]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-04-07 01:06:18 -04:00
Brian Paul 177c9be615 llvmpipe: remove no-op checks in sampler, sampler_view functions
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2014-04-03 20:05:56 -06:00
Dave Airlie 76ba50a25a mesa/soft/llvmpipe: add fake MSAA support
This adds a gallium cap that allows us to fake GL3.0 by
not exposing MSAA on sw rendering.
It also forces the extra extensions needed for GL3.2.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-04-02 12:12:04 +10:00
Zack Rusin bbdefabfc9 llvmpipe: Fix llvmpipe_create_gs_state.
Revert unintended behaviour change from commit
b995a010e6.

Tested-by: José Fonseca <jfonseca@vmware.com>
2014-03-26 16:11:28 +00:00
José Fonseca b995a010e6 llvmpipe: Simplify vertex and geometry shaders.
Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader.

Simplify lp_geometry_shader, as most of the incoming state is unneeded.
(We could also just use draw_geometry_shader if we were willing to peek
inside the structure.)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2014-03-25 12:54:39 +00:00
Roland Scheidegger 9477d8c862 llvmpipe: add support for b5g6r5_srgb
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-03-21 17:23:38 +01:00
Richard Sandiford f4b3430a36 llvmpipe: Tighten check for alpha-only formats
The AoS version of ld_build_blend_factor was assuming that if the first
channel was alpha, there were no rgb components.

Fixes glean/blendFunc on System z.  No piglit regressions on x86_64.
The shortcut is still used in tests like spec/ARB_framebuffer_object/
fbo-alpha.

Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
2014-03-20 16:50:40 +01:00
Zack Rusin dfa25ea5cd gallium: allow setting of the internal stream output offset
D3D10 allows setting of the internal offset of a buffer, which is
in general only incremented via actual stream output writes. By
allowing setting of the internal offset draw_auto is capable
of rendering from buffers which have not been actually streamed
out to. Our interface didn't allow. This change functionally
shouldn't make any difference to OpenGL where instead of an
append_bitmask you just get a real array where -1 means append
(like in D3D) and 0 means do not append.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-03-07 12:49:33 -05:00
Marek Olšák db8886ed09 gallium: the other drivers don't support ARB_buffer_storage
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2014-02-25 16:07:33 +01:00
Dave Airlie 2fcbec48d7 gallium: add texture gather support to gallium (v3)
This adds support to gallium for a TG4 instruction,
and two CAPs. The first CAP is required for GL_ARB_texture_gather.

The second CAP is required to expose GL_ARB_gpu_shader5.

However so far we haven't found any hardware that natively
exposes the textureGatherOffsets feature from GL, so just
lower it for now. If hardware appears for this we can add
another CAP to allow TG4 to take 4 offsets.

v2: add component selection src and a cap to say
hw can do it. (st can use to help control
GL_ARB_gpu_shader5/GLSL 4.00). Add docs.

v3: rename to SM5, add docs.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-02-25 13:29:17 +10:00
Grigori Goronzy d34d5fddf8 gallium: add geometry shader output limits
v2: adjust limits for radeonsi and llvmpipe
v3: add documentation

Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2014-02-09 23:31:38 +01:00
Marek Olšák 0354b769c2 gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERS
This can be derived from the shader caps.

All GPUs from ATI/AMD, NVIDIA, and INTEL have separate texture slots
for each shader stage.
2014-02-04 20:19:16 +01:00
Roland Scheidegger 1d53603f1f llvmpipe: fix denorm handling for r11g11b10_float format when blending
The code re-enabling denorms for small float formats did not recognize
this format due to format handling hacks (mainly, the lp_type doesn't have
the floating bit set).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-01-31 19:51:06 +01:00
Siavash Eliasi 809d3a7d25 llvmpipe: Set PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT to 64
v2: Fixed setting switch cases prior to
PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT incorrectly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-01-29 09:09:41 -07:00
Siavash Eliasi 6317664de0 llvmpipe: Use alignment of 64 instead of 16 for buffer allocation
v2: Changed allocation alignment of llvmpipe_displaytarget_layout.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-01-29 09:09:41 -07:00
José Fonseca fd33a6bcd7 gallium: Use C11 thread abstractions.
Note that PIPE_ROUTINE now returns an int.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
2014-01-23 12:55:55 +00:00
Marek Olšák d382e90614 gallium: remove PIPE_CAP_SCALED_RESOLVE
If any driver doesn't support this, it can use a blit after resolving
the samples.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-01-23 01:47:14 +01:00
Dave Airlie 2212a97fe3 llvmpipe: dump geometry shaders when using LP_DEBUG=tgsi
for consistency with vs and fs dumpers.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-01-22 14:08:03 +10:00
José Fonseca 8771285054 s/Tungsten Graphics/VMware/
Tungsten Graphics Inc. was acquired by VMware Inc. in 2008.  Leaving the
old copyright name is creating unnecessary confusion, hence this change.

This was the sed script I used:

    $ cat tg2vmw.sed
    # Run as:
    #
    #   git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed
    #

    # Rename copyrights
    s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g
    /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./
    s/TUNGSTEN GRAPHICS/VMWARE/g

    # Rename emails
    s/alanh@tungstengraphics.com/alanh@vmware.com/
    s/jens@tungstengraphics.com/jowen@vmware.com/g
    s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/
    s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g
    s/keithw\?@tungstengraphics.com/keithw@vmware.com/g
    s/michel@tungstengraphics.com/daenzer@vmware.com/g
    s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/
    s/zack@tungstengraphics.com/zackr@vmware.com/

    # Remove dead links
    s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g

    # C string src/gallium/state_trackers/vega/api_misc.c
    s/"Tungsten Graphics, Inc"/"VMware, Inc"/

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-01-17 20:00:32 +00:00
Brian Paul d6fa71fbb0 llvmpipe: handle NULL color buffer pointers
Fixes regression from 9baa45f78b

v2: incorporate a few small changes suggested by Roland.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-17 08:52:11 -08:00
Roland Scheidegger 3b64714da4 llvmpipe: fix large point rasterization with point_quad_rasterization
The whole round-pointsize-to-int stuff must only be done with GL legacy
rules (no point_quad_rasterization) or all the wrong edges are lit up.
This was previously in a private branch (d3d pointsprite test complains
loudly otherwise) and got lost in a merge. However, it should certainly
apply to GL point sprite rasterization as well.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-01-17 17:01:01 +01:00
Zack Rusin 93b953d139 llvmpipe: do constant buffer bounds checking in shaders
It's possible to bind a smaller buffer as a constant buffer, than
what the shader actually uses/requires. This could cause nasty
crashes. This patch adds the architecture to pass the maximum
allowable constant buffer index to the jit to let it make
sure that the constant buffer indices are always within bounds.
The behavior follows the d3d10 spec, which says the overflow
should always return all zeros, and overflow is only defined
as access beyond the size of the currently bound buffer. Accesses
beyond the declared shader constant register size are not
considered an overflow and expected to return garbage but consistent
garbage (we follow the behavior which some wlk tests expect which
is to return the actual values from the bound buffer).

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-16 16:33:57 -05:00
José Fonseca 9b96be595b llvmpipe: Honour pipe_rasterizer::point_quad_rasterization.
Commit eda21d2a30 fixed the rasterization
of points for Direct3D but ended up breaking the rasterization of OpenGL
non-sprite points, in particular conform's pntrast.c test.

The only way to get both working is to properly honour
pipe_rasterizer::point_quad_rasterization, and follow the weird OpenGL
rule when it is false.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-09 12:35:11 +00:00
José Fonseca eda21d2a30 llvmpipe: Fix the bottom_edge_rule adjustment for points.
The adjustment needs to be applied to the y coordinates and not the x
coordinates, just like the equivalent code for lines and triangles in
lp_setup_line.c and lp_setup_tri.c.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2014-01-08 12:18:17 +00:00
José Fonseca 37de6b0682 llvmpipe: Respect bottom_edge_rule when computing the rasterization bounding boxes.
This was inadvertently forgotten when replacing gl_rasterization_rules
with lower_left_origin and half_pixel_center (commit
2737abb44e).

This makes a difference when lower_left_origin != half_pixel_center, e.g,
D3D10.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2014-01-08 12:18:17 +00:00
José Fonseca 2d368b982a llvmpipe: Basic implementation of pipe_context::set_sample_mask.
We don't support MSAA (ie, number of samples is always one) therefore
sample_mask boils down to a synonym of the rasterizer_discard flag.

Also, this change makes setup actually use the value received in
lp_setup_set_rasterizer_discard instead of reaching out to llvmpipe
upper layers to re-fetch it.

Based on Si Chen's draft.

With this patch `wgf11multisample Coverage passes 100%` on the UMD
D3D10 state tracker.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
2014-01-07 16:04:42 +00:00
Si Chen 72c6d0e506 llvmpipe: Implement alpha_to_coverage for non-MSAA framebuffers.
Implement Alpha to Coverage by discarding a fragment alpha component is
less than 0.5.  This is a joint work of Jose and Si.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-07 16:04:42 +00:00
Jonathan Liu 7990ab58fa llvmpipe: use pipe_sampler_view_release() to avoid segfault
This fixes another case of faulting when freeing a pipe_sampler_view
that belongs to a previously destroyed context.

Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Liu <net147@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-12-22 07:07:56 -07:00
Roland Scheidegger 7c027666da llvmpipe: get rid of barycentric calculation of a0
Didn't really work as well as hoped (in particular it was not generally
more accurate), will solve this differently.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-12-14 17:11:03 +01:00
Roland Scheidegger bfcf1ba1c4 llvmpipe: (trivial) get rid of triangle subdivision code
This code was always problematic, and with 64bit rasterization we no longer
need it at all.

Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-12-14 17:11:03 +01:00
Dave Airlie ba00f2f6f5 swrast* (gallium, classic): add MESA_copy_sub_buffer support (v3)
This patches add MESA_copy_sub_buffer support to the dri sw loader and
then to gallium state tracker, llvmpipe, softpipe and other bits.

It reuses the dri1 driver extension interface, and it updates the swrast
loader interface for a new putimage which can take a stride.

I've tested this with gnome-shell with a cogl hacked to reenable sub copies
for llvmpipe and the one piglit test.

I could probably split this patch up as well.

v2: pass a pipe_box, to reduce the entrypoints, as per Jose's review,
add to p_screen doc comments.

v3: finish off winsys interfaces, add swrast classic support as well.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

swrast: add support for copy_sub_buffer
2013-12-13 14:37:01 +10:00
Matthew McClure e84a1ab3c4 llvmpipe: add plumbing for ARB_depth_clamp
With this patch llvmpipe will adhere to the ARB_depth_clamp enabled state when
clamping the fragment's zw value. To support this, the variant key now includes
the depth_clamp state. key->depth_clamp is derived from pipe_rasterizer_state's
(depth_clip == 0), thus depth clamp is only enabled when depth clip is disabled.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-12-11 18:24:21 +00:00
Zack Rusin 7a50d38a2b llvmpipe: add a very useful (disabled) debugging output
Disabled by default, but it's very useful when needed.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-12-10 16:41:11 -05:00
Zack Rusin 155139059b llvmpipe: fix blending with half-float formats
The fact that we flush denorms to zero breaks our half-float
conversion and blending. This patches enables denorms for
blending. It's a little tricky due to the llvm bug that makes
it incorrectly reorder the mxcsr intrinsics:
http://llvm.org/bugs/show_bug.cgi?id=6393

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
2013-12-10 16:39:48 -05:00
Matthew McClure 0319ea9ff6 llvmpipe: clamp fragment shader depth write to the current viewport depth range.
With this patch, generate_fs_loop will clamp any fragment shader depth writes
to the viewport's min and max depth values. Viewport selection is determined
by the geometry shader output for the viewport array index. If no index is
specified, then the default viewport index is zero. Semantics for this path
can be found in draw_clamp_viewport_idx and lp_clamp_viewport_idx.

lp_jit_viewport was created to store viewport information visible to JIT code,
and is validated when the LP_NEW_VIEWPORT dirty flag is set.

lp_rast_shader_inputs is responsible for passing the viewport_index through
the rasterizer stage to fragment stage (via lp_jit_thread_data).

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-12-09 12:57:02 +00:00
Marek Olšák 1a02bb71dd gallium: add support for AMD_vertex_shader_layer 2013-12-03 19:39:13 +01:00
Roland Scheidegger 2983c039df gallium: new shader cap bit for the amount of sampler views
Ever since introducing separate sampler and sampler view max this was really
missing.
Every driver but llvmpipe reports the same number as number of samplers for
now, so nothing should break.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-28 04:02:18 +01:00
Zack Rusin 0510ec67e2 llvmpipe: support 8bit subpixel precision
8 bit precision is required by d3d10 but unfortunately
requires 64 bit rasterizer. This commit implements
64 bit rasterization with full support for 8bit subpixel
precision. It's a combination of all individual commits
from the llvmpipe-rast-64 branch.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-11-25 13:05:03 -05:00
Roland Scheidegger f69d2c857d llvmpipe: (trivial) disable new accurate origin calculation
It looks like there's some bugs in it...
2013-11-22 11:29:00 +00:00
Roland Scheidegger 28d7b4147d llvmpipe: calculate more accurate interpolation value at origin
Some rounding errors could crop up when calculating a0. Use a more accurate
method (barycentric interpolation essentially) to fix this, though to fix
the REAL problem (which is that our interpolation will give very bad results
with small triangles far away from the origin when they have steep gradients)
this does absolutely nothing (actually makes it worse). (To fix the real
problem, either would need to use a vertex corner (or some other point inside
the tri) as starting point value instead of fb origin and pass that down to
interpolation, or mimic what hw does, use barycentric interpolation (using
the coordinates extracted from the rasterizer edge functions) - maybe another
time.)
Some (silly) tests though really want a high accuracy at fb origin and don't
care much about anything else (Just. Don't. Ask.).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-21 20:39:19 +00:00
Emil Velikov 7dac1b470a gallium/drivers: compact compiler flags into Automake.inc
* minimise flags duplication
* distingush between VISIBILITY C and CXX flags
* set only required flags - C and/or CXX

v2: add LLVM_CFLAGS back to AM_CFLAGS (add missing backslash)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2013-11-16 16:29:28 +00:00
Roland Scheidegger 473cb3fe4a llvmpipe: (trivial) fix more fallout from the setup cleanup.
Oops... Should have done some more testing.
2013-11-14 15:49:42 +00:00
Roland Scheidegger 5190c16a04 llvmpipe: (trivial) fix misplaced bld context assignment.
Should fix polygon offset crashes...
2013-11-14 14:44:15 +00:00
Roland Scheidegger 2dd693412a llvmpipe: clean up state setup code a bit
In particular get rid of home-grown vector helpers which didn't add much.
And while here fix formatting a bit. No functional change.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-14 12:24:55 +00:00
Roland Scheidegger 754319490f gallivm,llvmpipe: fix float->srgb conversion to handle NaNs
d3d10 requires us to convert NaNs to zero for any float->int conversion.
We don't really do that but mostly seems to work. In particular I suspect the
very common float->unorm8 path only really passes because it relies on sse2
pack intrinsics which just happen to work by luck for NaNs (float->int
conversion in hw gives integer indeterminate value, which just happens to be
-0x80000000 hence gets converted to zero in the end after pack intrinsics).
However, float->srgb didn't get so lucky, because we need to clamp before
blending and clamping resulted in NaN behavior being undefined (and actually
got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp
with defined nan behavior as we can handle the NaN for free this way.
I suspect there's more bugs lurking in this area (e.g. converting floats to
snorm) as we don't really use defined NaN behavior everywhere but this seems
to be good enough.
While here respecify nan behavior modes a bit, in particular the return_second
mode didn't really do what we wanted. From the caller's perspective, we really
wanted to say we need the non-nan result, but we already know the second arg
isn't a NaN. So we use this now instead, which means that cpu architectures
which actually implement min/max by always returning non-nan (that is adhering
to ieee754-2008 rules) don't need to bend over backwards for nothing.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-14 12:24:55 +00:00
Roland Scheidegger 50f19e3a66 draw,llvmpipe: use exponent manipulation instead of exp2 for polygon offset
Since we explicitly require a integer input we should avoid using exp2 math
(even if we were using optimized versions), which turns the exp2 into a int
sub (plus some casts).

v2: fix bogus uint (needs to be int) math spotted by Matthew, fix comments

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-12 19:08:58 +00:00
Matthew McClure f9e2c24326 draw,llvmpipe,util: add depth bias calculation for arb_depth_buffer_float
With this patch, the llvmpipe and draw modules will calculate the depth bias
according to floating point depth buffer semantics described in the
arb_depth_buffer_float specification, when the driver has a z buffer bound
with a format type of UTIL_FORMAT_TYPE_FLOAT.

By default, the driver will use the existing UNORM calculation for depth bias.

A new function, draw_set_zs_format, was added to calculate the Minimum
Resolvable Depth value and floating point depth sense for the draw module.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-11-07 18:32:54 +00:00
Roland Scheidegger e4195acab5 llvmpipe: fix bogus layer clamping in setup
The layer coming from GS needs to be clamped (not sure if that's actually
the correct error behavior but we need something) as the number can be higher
than the amount of layers in the fb. However, this code was using the layer
calculation from the scene, and this was actually calculated in
lp_scene_begin_rasterization() hence too late (so setup was using the value
from the _previous_ scene or just zero if it was the first scene).
Since the value is used in both rasterization and setup, move calculation up
to lp_scene_begin_binning() though it's a bit more inconvenient to calculate
there. (Theoretically could move _all_ code which was in
lp_scene_begin_rasterization() to there, because ever since we got rid of
swizzled render/depth buffers our "map" functions preparing the fb data for
render don't actually change the data in there at all, but it feels like
it would be a hack.)

v2: improve comments

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-10-29 17:54:03 +01:00
Matthew McClure be0b67a143 util,llvmpipe: correctly set the minimum representable depth value
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-10-29 15:53:48 +00:00
Ilia Mirkin 12d39b4fa8 gallium: add PIPE_CAP_MIXED_FRAMEBUFFER_SIZES
This CAP will determine whether ARB_framebuffer_object can be enabled.
The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf
textures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2013-10-26 01:36:07 +02:00
Brian Paul a3ed98f7aa gallium: new, unified pipe_context::set_sampler_views() function
The new function replaces four old functions: set_fragment/vertex/
geometry/compute_sampler_views().

Note: at this time, it's expected that the 'start' parameter will
always be zero.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
2013-10-23 10:15:38 -06:00
Roland Scheidegger ac81b6f2be llvmpipe: enable seamless cube filtering
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-10-21 15:42:04 +02:00
José Fonseca 40ddd8b659 Revert "scons: Fix build when rtti is disabled"
This reverts commit 94d05bf87a as it has a
few problems:

- it breaks windows builds becuase env[LLVM_CXXFLAGS] is never set there

- it is merging not only rtti, but the whole cxxflags (defines etc)
  which has proven to be a source of troubles (breaks debugging etc.)
2013-10-16 15:05:51 -07:00
Alexander von Gluck IV 94d05bf87a scons: Fix build when rtti is disabled
* The rtti fix actually dug up a bug in the scons build scripts.
* Autotools took the LLVM cpp and cxx flags, while scons only took
  the cpp flags.
* This grabs the cxx flags and applies them where needed. We may
  want to make the same change for the llvm cpp flags in scons.
* The only linux platform I can find with LLVM no-rtti is Ubuntu.
* Fixes bug #70471

Tested-by: Vinson Lee <vlee@freedesktop.org>
2013-10-15 22:12:18 -05:00
José Fonseca 85d7f6779f llvmpipe: Advertise PIPE_CAP_DEPTH_CLIP_DISABLE.
Actually implemented by draw module.

Tested piglit ARB_depth_clamp tests, which pass 100%.

Trivial.
2013-10-15 18:22:57 -07:00
Roland Scheidegger 75f1fea14f llvmpipe: increase fs shader variant instruction cache limit by factor 4
The previous limit of of 128*1024 was reported to cause frequent recompiles
in some apps due to shader variant thrashing on IRC in some apps leading
to noticeable lags.
Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less impossible
to reach, since even simple fragment shaders without texturing (glxgears) used
more than twice than 128 instructions, hence the instruction limit would have
always been reached first (excluding things like trivial shaders not writing
color). Even with the new limit it is VERY likely the instruction limit is hit
first.
Should help with such lags due to recompiles (though other shader types have
their own limits, LP_MAX_SETUP_VARIANTS and DRAW_MAX_SHADER_VARIANTS, in
particular the latter seems a bit small (128)).

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-10-12 04:05:57 +02:00
José Fonseca 1aef0ef277 llvmpipe: We don't use the draw pipeline for offset_point/line.
Unless the polygon fill mode is different from PIPE_POLYGON_MODE_FILL,
so checking the the polygon mode is sufficient.

Testing done: no regression in polygon-mode-offset
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-10-09 21:09:07 -07:00
Zack Rusin edde6c77bd llvmpipe: abstract the code to set number of subpixel bits
As we're moving towards expanding the number of subpixel
bits and the width of the variables used in the computations
we need to make this code a bit more centralized.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-10-09 18:30:31 -04:00
Marek Olšák 085e5adede gallium/swrast: don't export any private symbols
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-10-08 16:23:52 +02:00
Brian Paul 2d0effaa10 llvmpipe: remove old bind_*_sampler_states() functions 2013-10-03 14:05:27 -06:00
Brian Paul c772338488 llvmpipe: implement pipe_context::bind_sampler_states() 2013-10-03 14:05:26 -06:00
Emil Velikov e369126709 llvmpipe: consolidate C sources list into Makefile.sources
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-10-01 07:29:49 -07:00
Vinson Lee 76df7edacf llvmpipe: Remove unnecessary null check of shader.
shader has already been dereferenced earlier so cannot be null here.

Fixes "Dereference before null check" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-09-30 22:00:54 -07:00
Zack Rusin 60c448faea llvmpipe: count c_primitives before discarding null prims
We need to count the clipper primitives before the rasterizer
discards one it considers to be null.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-09-25 19:41:02 -04:00
Zack Rusin 1291e833e7 llvmpipe: we need to subdivide if fb is bigger in either direction
We need to subdivide triangles if either of the dimensions is
larger than the max edge length, not when both of them are larger.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-09-25 19:38:21 -04:00
Zack Rusin 71ecc2cf71 Revert "llvmpipe: increase number of subpixel bits to eight"
This reverts commit 755c11dc5e.
We agreed that this is band-aid that's not very useful and
the proper solution is to rewrite the rasterization algo
so that it operates on 64 bit values.

Signed-off-by: Zack Rusin <zackr@vmware.com>
2013-09-24 15:10:02 -04:00
Zack Rusin e5ec5aef2b llvmpipe: align the array used for subdivived vertices
When subdiving a triangle we're using a temporary array to store
the new coordinates for the subdivided triangles. Unfortunately
the array used for that was not aligned properly causing
random crashes in the llvm jit code which was trying to load
vectors from it.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-09-23 18:10:51 -04:00
Zack Rusin 755c11dc5e llvmpipe: increase number of subpixel bits to eight
Unfortunately d3d10 requires a lot higher precision (e.g.
wgf11clipping tests for it). The smallest number of precision
bits with which it passes is 8. That means that we need to
decrease the maximum length of an edge that we can handle without
subdivision by 4 bits. Abstracted the code a bit to make it easier
to change once to switch to 64bit rasterization.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-09-23 14:53:07 -04:00
Marek Olšák 419cd5f2a2 gallium: add flush_resource context function
r600g needs explicit flushing before DRI2 buffers are presented on the screen.

v2: add (stub) implementations for all drivers, fix frontbuffer flushing
v3: fix galahad

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2013-09-20 20:35:55 +02:00
José Fonseca 1569b3e536 llvmpipe: Fix rendering to PIPE_FORMAT_R10G10B10A2_UNORM.
We must take rounding in consideration when re-scaling to narrow
normalized channels, such as 2-bit normalized alpha.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-09-20 17:34:57 +01:00
Roland Scheidegger bd3909f265 draw: clean up setting stream out information a bit
In particular noone is interested in the vertex count, so drop that,
and also drop the duplicated num_primitives_generated /
so.primitives_storage_needed variables in drivers. I am unable for now to figure
out if primitives_storage_needed in SO stats (used for d3d10) should
increase if SO is disabled, though the equivalent num_primitives_generated
used for OpenGL definitely should increase. In any case we were only counting
when SO is active both in softpipe and llvmpipe anyway so don't pretend there's
an independent num_primitives_generated counter which would count always.
(This means the PIPE_QUERY_PRIMITIVES_GENERATED count will still be wrong just
as before, should eventually fix this by doing either separate counting for this
query or adjust the code so it always counts this even if SO is inactive depending
on what's correct for d3d10.)

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-08-27 16:59:39 +02:00
Roland Scheidegger aff2ecf09a llvmpipe: support nested/overlapping queries for all query types
There's just no way resetting the counters is working with nested/overlapping
queries.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-08-27 16:59:01 +02:00
Roland Scheidegger ac1a2714c7 gallivm: implement better control of per-quad/per-element/scalar lod
There's a new debug value used to disable per-quad lod optimizations
in fragment shader (ignored for vs/gs as the results are just too wrong
typically). Also trying to detect if a supplied lod value is really a
scalar (if it's coming from immediate or constant file) in which case
sampler code can use this to stay on per-quad-lod path (in fact for
explicit lod could simplify even further and use same lod for both
quads in the avx case but this is not implemented yet).
Still need to actually implement per-element lod bias (and derivatives),
and need to handle per-element lod in size queries.

v2: fix comments, prettify.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-20 23:00:24 +02:00
Roland Scheidegger abdd32dcd5 llvmpipe: fix stencil bug if we have both stencil and depth tests
This is a very well hidden bug found by accident (only the fixed glean
tstencil2 test so far seems to hit it).
We must use new mask with combined s_pass values and orig_mask values
for zpass/zfail stencil ops, otherwise both the sfail op and one of
zpass/zfail op are applied (probably not hit in most tests because
some of the ops tend to be KEEP usually).

Note: this is a candidate for the 9.2 branch.

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-15 17:30:07 +02:00
Zack Rusin 27cedd8aec llvmpipe: fix pipeline statistics with a null ps
If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-08-14 18:23:36 -04:00
Roland Scheidegger 894d4903e7 gallivm: set non-existing values really to zero in size queries for d3d10
My previous attempt at doing so double-failed miserably (minification of
zero still gives one, and even if it would not the value was never written
anyway).
While here also rename the confusingly named int_vec bld as we have int vecs
of different sizes, and rename need_nr_mips (as this also changes out-of-bounds
behavior) to is_sviewinfo too.

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-09 20:49:19 +02:00
Roland Scheidegger b0f74250e1 gallivm: use texture target from shader instead of static state for size query
d3d10 has no notion of distinct array resources neither at the resource nor
sampler view level. However, shader dcl of resources certainly has, and
d3d10 expects resinfo to return the values according to that - in particular
a resource might have been a 1d texture with some array layers, then the
sampler view might have only used 1 layer so it can be accessed both as 1d
or 1d array texture (I think - the former definitely works). resinfo of a
resource decleared as array needs to return number of array layers but
non-array resource needs to return 0 (and not 1). Hence fix this by passing
the target from the shader decl to emit_size_query and use that (in case of
OpenGL the target will come from the instruction itself).
Could probably do the same for actual sampling, though it may not matter there
(as the bogus components will essentially get clamped away), possibly could
wreak havoc though if it REALLY doesn't match (which is of course an error
but still).

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-09 20:49:18 +02:00
Roland Scheidegger eac57bc223 gallivm: propagate scalar_lod to emit_size_query too
Clearly the returned values need to be per-element if the lod is per element.
Does not actually change behavior yet.

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-08 18:55:57 +02:00
Zack Rusin 12522041d6 draw: fix slot detection
Nowadays -1 for slots means that the semantic is not present, so
we need to store it in a signed variables, otherwise <0 comparisons
are pointless. Fixes
http://bugzilla.eng.vmware.com/show_bug.cgi?id=67811 (at least
with softpipe, edgeflags don't work wit llvmpipe)

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-08-06 20:23:57 -04:00
Vinson Lee b57c1e4b86 llvmpipe: Do not need to free anything if there is no geometry shader.
If gs is null, then freeing state->shader.tokens would result in a null
dereference.

Fixes "Dereference after null check" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-08-05 21:54:20 -07:00
Zack Rusin 95829e2029 llvmpipe: fix frontface behavior again
Lets make sure the frontface is 1 for front and -1 for back.
Discussed with Roland and Jose.

Signed-off-by: Zack Rusin <zackr@vmware.com>
2013-08-02 22:21:29 -04:00
Zack Rusin bff0d87668 llvmpipe: don't interpolate front face or prim id
The loop was iterating over all the fs inputs and setting them
to perspective interpolation, then after the loop we were
creating extra output slots with the correct interpolation. Instead
of injecting bogus extra outputs, just set the interpolation
on front face and prim id correctly when doing the initial scan
of fs inputs.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-08-02 20:12:53 -04:00
Zack Rusin d6b3a193d4 draw: inject frontface info into wireframe outputs
Draw module can decompose primitives into wireframe models, which
is a fancy word for 'lines', unfortunately that decomposition means
that we weren't able to preserve the original front-face info which
could be derived from the original primitives (lines don't have a
'face'). To fix it allow draw module to inject a fake face semantic
into outputs from which the backends can figure out the original
frontfacing info of the primitives.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-02 20:11:18 -04:00
Zack Rusin 2d15f4746b llvmpipe: make the front-face behavior match the gallium spec
The spec says that front-face is true if the value is >0 and false
if it's <0. To make sure that we follow the spec, lets just
subtract 0.5 from our value (llvmpipe did 1 for frontface and 0
otherwise), which will get us a positive num for frontface and
negative for backface.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-02 15:50:16 -04:00
Tom Stellard 4e90bc9a12 gallium: Add PIPE_CAP_ENDIANNESS
Cc: mesa-stable@lists.freedesktop.org
[ Francisco Jerez: Fix "PIPE_ENDIAN_SMALL" in the documentation,
  define PIPE_ENDIAN_NATIVE. ]
2013-07-22 22:43:17 +02:00
Zack Rusin 7bae56c5c2 llvmpipe: Ensure FTZ/DAZ flags are set on deferred draw flushes.
Tested-by: José Fonseca <jfonseca@vmware.com>
2013-07-22 18:11:39 +01:00
José Fonseca 2a650611be llvmpipe: Remove lp_rast_get_num_threads().
Never called.

Trivial.
2013-07-22 18:08:39 +01:00
Zack Rusin f59cb67376 llvmpipe/tests: update arith test to check for edge cases
Test infs, zeros and nans with our arith functions to assure
correct/defined behavior with those values.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-19 16:29:18 -04:00
Roland Scheidegger 4ef19f7fec llvmpipe: clamp inputs for srgb render buffers
Usually with fixed point renderbuffers clamping is done as part of conversion.
However, since we blend in float format, we essentially skip all conversion
steps pre-blend but since this is still a fixed point renderbuffer we must
still clamp the inputs in this case. Makes no difference for piglit though.
Obviously we could skip this if fragment color clamping is enabled, but a)
this is deprecated in OpenGL (d3d never had it) and b) we don't support it
natively so it gets baked into the shader.
Also add some comment about logic ops being broken for srgb, luckily no test
tries to do that as there's no easy fix...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-07-18 19:04:20 +02:00
Roland Scheidegger e57b98bad3 llvmpipe: fix blending with SRC_ALPHA_SATURATE with some formats without alpha
We were fixing up the blend factor to ZERO, however this only works correctly
with fixed point render buffers where the input values are clamped to 0/1
(because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped
inputs). Haven't seen any failure anywhere due to that with fixed point SNORM
buffers (which clamp inputs to -1/1) but it should apply there as well (snorm
blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all,
d3d10 requires them but they are not blendable).
Doesn't look like piglit hits this though (some internal testing hits the
float case at least). (With legacy OpenGL we could theoretically still use the
fixup to zero if the fragment color clamp is enabled, but we can't detect that
easily since we don't support native clamping hence it gets baked into the
shader.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-07-18 19:03:35 +02:00
Roland Scheidegger dc1cc928ed llvmpipe: support sRGB framebuffers
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.

v2: prettify a bit, use separate function for packing.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-16 01:54:51 +02:00
Zack Rusin 63386b2f66 util: treat denorm'ed floats like zero
The D3D10 spec is very explicit about treatment of denorm floats and
the behavior is exactly the same for them as it would be for -0 or
+0. This makes our shading code match that behavior, since OpenGL
doesn't care and on a few cpu's it's faster (worst case the same).
Float16 conversions will likely break but we'll fix them in a follow
up commit.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-09 23:30:55 -04:00
Roland Scheidegger f3bbf65929 gallivm: do per-pixel lod calculations for explicit lod
d3d10 requires per-pixel lod calculations for explicit lod, lod bias and
explicit derivatives, and we should probably do it for OpenGL too - at least
if they are used from vertex or geometry shaders (so doesn't apply to lod
bias) this doesn't just affect neighboring pixels.
Some code was already there to handle this so fix it up and enable it.
There will no doubt be a performance hit unfortunately, we could do better
if we'd knew we had a real vector shift instruction (with variable shift
count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu).
Don't do anything for lod bias and explicit derivatives yet, though
no special magic should be needed for them neither.
Likewise, the size query is still broken just the same.

v2: Use information if lod is a (broadcast) scalar or not. The idea would be
to base this on the actual value, for now just pretend it's a scalar in fs
and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same
code is generated for fs as before).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-04 19:42:04 +02:00
Marek Olšák 30c3e8718d mesa,glsl,gallium: remove GLSLSkipStrictMaxVaryingLimitCheck and dependencies
Not needed with do_dead_builtin_varyings.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-07-02 17:02:14 +02:00
Roland Scheidegger 7d430bfab9 llvmpipe: fix timer query if there's no bins
b04a295a4a removed seemingly unnecessary
code in get_query. Turns out this code could in fact be reached - while
timestamps are always binned, if there are no bins (which happens if fb
size is 0) then the rasterization query code filling this in is still
never executed.
So fix this up by filling in some timestamp, but do it at EndQuery time
not GetQuery time which should be more appropriate.
Makes piglit arb_timer_query-timestamp-get happy again.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-29 16:58:02 +02:00
Roland Scheidegger 670f829102 llvmpipe: handle offset_clamp
This was just ignored (unless for some reason like unfilled polys draw was
handling this).
I'm not convinced of that code, putting the float for the clamp in the key
isn't really a good idea. Then again the other floats for depth bias are
already in there too anyway (should probably have a jit_context for the
setup function), so this is just a quick fix.
Also, the "minimum resolvable depth difference" used isn't really right as it
should be calculated according to the z values of the current primitive
and not be a constant (of course, this only makes a difference for float
depth buffers), at least for d3d10, so depth biasing is still not quite right.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-27 19:06:40 +02:00
Roland Scheidegger b04a295a4a llvmpipe: remove never reached code for timestamp queries.
timestamp queries are always binned in an active scene, therefore
always have a result.
2013-06-27 19:06:40 +02:00
Roland Scheidegger 59b8689d37 llvmpipe: fix a bug in opaque optimization
If there are queries active the opaque optimization reseting the bin needs to
be disabled.
(Not really tested since the bug was discovered by code inspection not
an actual test failure.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-27 19:06:40 +02:00
Roland Scheidegger 2e4da1f594 llvmpipe: add support for nested / overlapping queries
OpenGL doesn't support this but d3d10 does.
It is a bit of a pain as it is necessary to keep track of queries
still active at the end of a scene, which is also why I cheat a bit
and limit the amount of simultaneously active queries to (arbitrary)
16 (simplifies things because don't have to deal with a real list
that way). I can't think of a reason why you'd really want large
numbers of overlapping/nested queries so it is hopefully fine.
(This only affects queries which need to be binned.)

v2: don't copy remainder of array when deleting an entry simply replace
the deleted entry with the last one (order doesn't matter).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-26 23:17:53 +02:00
Roland Scheidegger 0820342880 llvmpipe: rework query logic
Previously lp_rast_begin_query commands were always inserted into each bin,
and re-issued if the scene was restarted, while lp_rast_end_query commands
were executed for each still active query at the end of tile rasterization.
Also, the ps_invocations and vis_counter were set to zero when the respective
command was encountered.
This however cannot work for multiple queries of the same type (note that
occlusion counter and occlusion predicate while different type were also
affected).
So, change the logic to always set the ps_invocations and vis_counter to zero
at the start of tile rasterization, and then use "start" and "end" per-thread
query values when encountering the begin/end query commands instead, which
should work for multiple queries of the same type. This also means queries do
not have to be reissued in a new scene, however they still need to be finished
at end of tile rasterization, so a list of queries still active at the end of
a scene needs to be maintained.
Also while here don't bin the queries which don't do anything in rasterization.
(This change does not actually handle multiple queries of the same type yet,
as the list of active queries is just a simple fixed array and setup can still
only have one query active per type.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-26 23:17:53 +02:00
Adam Jackson 2151d893fb gallium: Fix llvmpipe on big-endian machines
Squashed commit of the following:

commit 0857a7e105bfcbc4d1431b2cc56612094c747ca3
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:07 2013 -0400

    gallivm: Fix lp_build_rgba8_to_fi32_soa for big endian

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 0d65131649a8aa140e2db228ba779d685c4333e3
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:07 2013 -0400

    gallivm: Fix big-endian machines

    This adds a bit-shift count to the format table, and adds the concept of
    vector or bitwise alignment on gathers.

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 9740bda9b7dc894b629ed38be9b51059ce90818f
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:07 2013 -0400

    llvmpipe: Fix convert_to_blend_type on big-endian

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit ae037c2de0f029e4e99371c0de25560484f0d8df
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    util: Convert color pack to packed formats

    This fixes them on big-endian.

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 5b05ac0c89ae092ea8ba5bba9f739708d7396b5c
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    graw-xlib: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 51396e7d098cb6ff794391cf11afe4dbf86dbea0
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    format: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 417b60bc66eb450e68a92ab0e47f76e292b385e6
Author: Adam Jackson <ajax@redhat.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    st/dri: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 0934b2e022a5e0847d312c40734e2b44cac52fd8
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    st/xlib: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit a307ea3c3716a706963acce7966b5e405ba11db9
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    gbm: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 53eebdd253e1960a645ea278f31d7ef6a6cf4aeb
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    tests: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 2f77fe3ee524945eacd546efcac34f7799fb3124
Author: Adam Jackson <ajax@redhat.com>
Date:   Tue Jun 18 13:07:37 2013 -0400

    gallium: Document packed formats

    Signed-off-by: Adam Jackson <ajax@redhat.com>

commit 1f1017159ce951f922210a430de9229f91f62714
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    gallium: Introduce 32-bit packed format names

    These are for interacting with buffers natively described in terms of
    bit shifts, like X11 visuals:

        uint32_t xyzw8888 = (x << 0) | (y << 8) | (z << 16) | (w << 24);

    Define these in terms of (endian-dependent) aliases to the array-style
    format names.

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 6cc7ab1ee66ed668da78c1d951dfd7782b4e786a
Author: Adam Jackson <ajax@redhat.com>
Date:   Mon Jun 3 12:10:32 2013 -0400

    gallium: Document format name conventions

    v2:
    - Fix a channel name thinko (Michel Dänzer)
    - Elaborate on SCALED versus INT
    - Add links to DirectX and FOURCC docs

    Signed-off-by: Adam Jackson <ajax@redhat.com>

commit df4d269e7fb62051a3c029b84147465001e5776e
Author: Adam Jackson <ajax@redhat.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    gallivm: Remove all notion of byte-swapping

    Signed-off-by: Adam Jackson <ajax@redhat.com>

Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-06-24 09:48:56 -04:00
Roland Scheidegger d282f4ea9b llvmpipe: fix wrong results for queries not in a scene
The result isn't always 0 in this case (depends on query type),
so instead of special casing this just use the ordinary path (should result
in correct values thanks to initialization in query_begin/end), just
skipping the fence wait.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-22 17:09:37 +02:00
Roland Scheidegger 5c9aee111e llvmpipe: use 64bit counter for occlusion queries
Some APIs require 64bit and at least for 64bit archs the overhead
should be minimal.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-19 23:47:36 +02:00
Roland Scheidegger dc5dc4fd94 llvmpipe: handle more queries
Handle PIPE_QUERY_GPU_FINISHED and PIPE_QUERY_TIMESTAMP_DISJOINT, and
also fill out the ps_invocations and c_primitives from the
PIPE_QUERY_PIPELINE_STATISTICS (the others in there should already
be handled). Note that ps_invocations isn't pixel exact, just 16 pixel
exact but I guess it's better than nothing.
Doesn't really seem to work correctly but there's probably bugs elsewhere.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-19 23:47:36 +02:00
Zack Rusin 9542131b27 Revert "draw: clear the draw buffers in draw"
This reverts commit 41966fdb3b.
While it's a lot cleaner it causes regressions because
the draw interface is always called from the draw functions
of the drivers (because the buffers need to be mapped) which
means that the stream output buffers endup being cleared on
every draw rather than on setting.

Signed-off-by: Zack Rusin <zackr@vmware.com>
2013-06-17 21:43:10 -04:00
Roland Scheidegger 8975dc798d llvmpipe: fixes for conditional rendering
honor render_condition for clear_render_target and clear_depth_stencil.
Also add minimal support for occlusion predicate, though it can't be active
at the same time as an occlusion query yet.
While here also switchify some large if-else (actually just mutually
exclusive if-if-if...) constructs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-18 18:01:24 +02:00
Roland Scheidegger 793e8e3d7e gallium: add condition parameter to render_condition
For conditional rendering this makes it possible to skip rendering
if either the predicate is true or false, as supported by d3d10
(in fact previously it was sort of implied skip rendering if predicate
is false for occlusion predicate, and true for so_overflow predicate).
There's no cap bit for this as presumably all drivers could do it trivially
(but this patch does not implement it for the drivers using true
hw predicates, nvxx, r600, radeonsi, no change is expected for OpenGL
functionality).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-18 18:01:24 +02:00
Zack Rusin 41966fdb3b draw: clear the draw buffers in draw
Moves clearing of the draw so target buffers to the draw
module. They had to be cleared in the drivers before
which was quite messy.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-06-17 11:06:39 -04:00
Roland Scheidegger 4cce4efaa3 util: new util_fill_box helper
Use new util_fill_box helper for util_clear_render_target.
(Also fix off-by-one map error.)

v2: handle non-zero z correctly in new helper

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-13 00:41:43 +02:00
Roland Scheidegger 201d7a352b llvmpipe: move create_surface/destroy_surface functions to lp_surface.c
Believe it or not but these two are actually the first two functions which
really belong in this file nowadays.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-06-07 21:15:01 +02:00
Roland Scheidegger d8146f240e llvmpipe: add support for layered rendering
Mostly just make sure the layer parameter gets passed through to the right
places (and get clamped, can do this at setup time), fix up clears to
clear all layers and disable opaque optimization. Luckily don't need to
touch the jitted code.
(Clears invoked via pipe's clear_render_target method will not work however
since the pipe_util_clear function used for it doesn't handle clearing
multiple layers yet.)

v2: per Brian's suggestion, prettify var initialization and add some comments,
add assertion for impossible layer specification for surface.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-07 21:15:01 +02:00
Roland Scheidegger d0518c4c69 llvmpipe: bump 3d and cube map limits to 2048 and 8192 respectively
These should just work, required by d3d10. Too large resources will
get thrown out separately anyway.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-06-06 23:51:38 +02:00
Roland Scheidegger 008fd03600 llvmpipe: improve alignment calculation for fetching/storing pixels
This was always doing per-pixel alignment which isn't necessary, except
for the buffer case (due to the per-element offset). The disabled code
for calculating it was incorrect because it assumed that always the full
block would be fetched, which may not be the case, so fix this up.
The original code failed for instance for r10g10b10a2 the alignment would
have been calculated as 4 (block_width) * 4 (bytes) so 16, but the actual
fetch may have only fetched 2 values at a time, hence only alignment 8 -
it is unclear what exactly would happen in this case (alignment larger
than size to fetch).
So just use the (already calculated) fetch size instead and get alignment
from that which should always work, no matter if fetching 1,2 or 4 pixels.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-05 00:29:47 +02:00
Roland Scheidegger ffe2a1ca3c llvmpipe: reduce alignment requirement for 1d resources from 4x4 to 4x1
For rendering to buffers, we cannot have any y alignment.
So make sure that tile clear commands only clear up to the fb width/height,
not more (do this for all resources actually as clearing more seems
pointless for other resources too). For the jit fs function, skip execution
of the lower half of the fragment shader for the 4x4 stamp completely,
for depth/stencil only load/store the values from the first row
(replace other row with undef).
For the blend function, also only load half the values from fs output,
replace the rest with undefs so that everything still operates on the
full 4x4 block to keep code the same between 4x1 and 4x4 (except for
load/store of course which also needs to skip (store) or replace these
values with undefs (load))., at the cost of slightly less optimal code
being produced in some cases.
Also reduce 1d and 1d array alignment too, because they can be handled the
same as buffers so don't need to waste memory.

v2: don't try to run special blend code for 4x1, (very) slightly less
complexity if we just use the same code as for 4x4 which may or may not
make it easier to optimize in the future (as we care a lot more about 4x4
performance than 1d).

v2: don't use undef values for unused fs src outputs with llvm 3.1 as it
apparently can trigger a bug in llvm.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-05 00:29:47 +02:00
Roland Scheidegger ef3e887084 llvmpipe: cleanup of generate_unswizzled_blend
Some parameters were used inconsistently, for instance not using
block_width/block_height/block_size for deferring number of pixels
but rather relying on guesses from the number of fragment shaders etc,
so fix this up (no actual change in behavior since the block size stays
fixed). (Though most of the code would work with different block_height,
with three exceptions, one being the hacked r11g11b10 conversions and
twiddle code which only work with block_height 2 not 1, and the last
one being blend vector type not being 128bit wide.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-05 00:29:47 +02:00
Roland Scheidegger 44993c1808 gallivm: enhance special sse2 4x4f and 2x8f -> 1x16ub conversion
There's no good reason why it can't handle 2x4f->1x8ub, 1x4f->1x4ub and
1x8f->1x8ub cases, there might be legitimate reasons why we don't have
enough input vectors for a full destination vector, and using pack
intrinsics should still be much better than using generic conversion
(it looks like convert_alpha from the blend code might hit this though
I suspect it could be avoided).

v2: add another test vector format to lp_test_conv so this gets tested.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-05 00:29:46 +02:00
Roland Scheidegger f51fc7a71c llvmpipe: fix bogus assertions for buffer surfaces
One of the assertion made no sense for buffer rendertargets
(due to the union), so drop it. (The same assertion is present already in
the path for texture surfaces later.).

v2: make assertion completely accurate (suggested by Jose).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-01 20:03:59 +02:00
Roland Scheidegger 869c5d438f llvmpipe: reduce alignment requirement for resources from 64x64 to 4x4
The overallocation was very bad especially for things like 1d array
textures which got blown up by a factor of 64. (Even ordinary smallish
2d textures benefit a lot from this, a mipmapped 64x64 rgba8 texture
previously used 7*16kB = 112kB instead of now ~22kB.)
4x4 is chosen because this is the size the jit functions run on, so
making it smaller is going to be a bit more complicated.
It is actually not strictly 4x4 pixel, since we'd want to avoid situations
where different threads are rendering to the same cacheline so we keep
cacheline size alignment in x direction (often 64bytes).
To make this work introduce new task width/height parameters and make
sure clears don't clear the whole tile if it's a partial tile. Likewise,
the rasterizer may produce fragments outside the 4x4 blocks present in a
tile, so don't call the jit function for them.
This does not yet fix rendering to buffers (which cannot have any y
alignment at all), and 1d/1d array textures are still overallocated by a
factor of 4.

v2: replace magic number 4 with LP_RASTER_BLOCK_SIZE, fix size of buffers
allocated (needed in case we render to them).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-31 20:21:05 +02:00
Adam Jackson e881c9a5dc llvmpipe: Remove x/y from cmd_bin
These were mostly just a waste of memory and cache pressure, and were
really only used for debugging.

This change reduces instruction count (as measured by callgrind's Ir
event) of gnome-shell-perf-tool on Ivybridge by 3.5% ± 0.015% (n=20).

Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-05-31 20:21:05 +02:00
Zack Rusin c88ce3480c llvmpipe: clamp scissors to be between 0 and max
We need to clamp to make sure invalid shader doesn't crash our
driver. The spec says to return 0-th index for everything that's
out of bounds.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-05-25 09:49:20 -04:00
Zack Rusin 4b5595b38b draw: fixup draw_find_shader_output
draw_find_shader_output like most of the code in draw used to
depend on position always being at output slot 0. which meant
that any other attribute being at 0 could signify an error.
unfortunately position can be at any of the output slots, thus
other attributes can occupy slot 0 and we need to mark the ones
which were not found by something else. This commit changes
draw_find_shader_output so that it returns -1 if it can't
find the given attribute and adjust the code that depended
on it returning >0 whenever it correctly found an attrib.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-05-25 09:49:20 -04:00
Zack Rusin 97b8ae429e llvmpipe: implement support for multiple viewports
Largely related to making sure the rasterizer can correctly
pick out the correct scissor box for the current viewport.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-05-25 09:49:20 -04:00
Zack Rusin eaabb4ead0 gallium: Add support for multiple viewports
Gallium supported only a single viewport/scissor combination. This
commit changes the interface to allow us to add support for multiple
viewports/scissors.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca<jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-05-25 09:49:20 -04:00
Roland Scheidegger 33fcce3682 llvmpipe: get rid of tiled/linear layout remains
Eliminate the rest of the no longer needed layout logic.
(It is possible some code could be simplified a bit further still.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-29 00:41:06 +02:00
Roland Scheidegger 80e2cc0f97 llvmpipe: disable simple_shader optimization
This optimization disabled mask checks if the shader is simple enough.
While this should work correctly, the problem is that it can hide real issues
because shaders in practice are usually complex enough (8 instructions or 1
texture is already enough) so this doesn't get used, whereas dumbed-down
tests which should hit all the same code paths suddenly do something quite
different. This was the reason that bug 41787 could not be easily tracked as
stencil test not working correctly (piglit would in fact have failed some
tests without that optimization).
So disable it for now, it's unclear if it's much of a win in any case.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-22 22:57:27 +02:00
Roland Scheidegger e108716429 llvmpipe: fix early depth test / late depth write stencil issues
We actually did early depth/stencil test and late depth/stencil write even
when the shader could kill the fragment (alpha test or discard). Since it
matters for the new stencil value if the fragment is killed by depth/stencil
test or by the shader (in which case it will not reach the depth/stencil
test) this simply cannot work (we also would possibly skip writing the new
stencil value due to mask checks but this is a secondary issue).
So use late depth test / late depth write instead in this case.
(No piglit changes as it doesn't seem to hit such bogus early depth test
/ late depth write path.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-22 22:57:27 +02:00
Roland Scheidegger 82d7733b52 llvmpipe: fix issue with not writing new stencil values
We did mask checks between depth/stencil testing and depth/stencil write.
This meant that if the depth/stencil test killed off all fragments we never
actually wrote the new stencil value. This issue affected all early/late
test/write combinations.
So move the mask check after depth/stencil write (for early depth test,
could do the same for late depth test but might not be worth it at that
point so just skip it there).
This addresses https://bugs.freedesktop.org/show_bug.cgi?id=41787.
Piglit does not hit this issue because of the simple_shader optimization
in generate_fs_loop() which means we're skipping the mask checks.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-22 22:57:27 +02:00
Roland Scheidegger 3c91ef0f29 llvmpipe: (trivial) remove confusing code in stencil test
This was meant to disable some code which isn't needed when depth/stencil
isn't written. However, there's more code which wouldn't be needed in that
case so having the condition there was just odd (llvm will drop all the code
anyway).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-22 22:57:27 +02:00
Roland Scheidegger 5314f5d829 llvmpipe: fix bug in early depth test / late depth write handling
Using wrong type if the format was less than 32bits.
No piglit changes as it doesn't hit that path.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-22 22:57:27 +02:00