Commit Graph

55907 Commits

Author SHA1 Message Date
José Fonseca 1fefc65d20 st/wgl: Install our windows message hook to threads created before the ICD is loaded.
Otherwise we will not receive destroy windows events, causing framebuffers
to leak.

This happens particularly with java and jogl.

Tested with java + jogl, MATLAB.

VMware Internal Bug Number: 1013086.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-05 18:27:54 +01:00
Adam Jackson ca70de9bd2 llvmpipe: Work without sse2 if llvm is new enough
At least on llvm 3.2 this appears to work fine.  Tested on an Athlon XP
2600+, which has sse and 3dnow but not sse2.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-04-05 11:32:53 -04:00
Jerome Glisse b8998f976e winsys/radeon: add command stream replay dump for faulty lockup v3
Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to
enable it.

When enabled after each cs submission the code will try to detect lockup by
waiting on one of the buffer of the cs to become idle, after a timeout it
will consider that the cs triggered a lockup and will write a radeon_lockup.c
file in current directory that have all information for replaying the cs.

To build this file :
gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm

v2: Add radeon_ctx.h file to mesa git tree
v3: Slightly improve dumped file for easier editing, only dump first faulty cs

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2013-04-05 10:22:05 -04:00
Brian Paul 5192262833 st/xlib: add HUD support for xlib/GLX
For the softpipe and llvmpipe drivers.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 17:00:42 -06:00
Brian Paul f5071783c1 gallium/hud: add GALLIUM_HUD_PERIOD env var
To set the graph update rate, in seconds.  The default update rate
has also been changed to 1/2 second.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-04 17:00:42 -06:00
Brian Paul 6211c45186 gallium/hud: initialize sampler state
The default wrap mode (PIPE_TEX_WRAP_REPEAT) is incompatible with
unnormalized texcoords (at least for softpipe).

v2: use PIPE_TEX_WRAP_CLAMP_TO_EDGE

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-04 17:00:42 -06:00
Kenneth Graunke edc52a8f28 glsl: Add an optimization pass to flatten simple nested if blocks.
GLBenchmark 2.7's shaders contain conditional blocks like:

if (x) {
    if (y) {
        ...
    }
}

where the outer conditional's then clause contains exactly one statement
(the nested if) and there are no else clauses.  This can easily be
optimized into:

if (x && y) {
    ...
}

This saves a few instructions in GLBenchmark 2.7:

    total instructions in shared programs: 11833 -> 11649 (-1.55%)
    instructions in affected programs:     8234 -> 8050 (-2.23%)

It also helps CS:GO slightly (-0.05%/-0.22%).  More importantly,
however, it simplifies the control flow graph, which could enable other
optimizations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-04-04 15:38:19 -07:00
Kenneth Graunke 967514ce68 i965: Use a variable for the push constant size in kB.
This clarifies that the offset of 2 is actually 16 kB / 8kB units.
It also keys both computations off of a single variable, which should
make it easier to change in the future.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-04-04 15:38:19 -07:00
Kenneth Graunke 8cdb2d32ec i965: Turn brw->urb.vs_size and gs_size into local variables.
These variables are only used within a single function, so we may as
well make them local variables.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-04-04 15:38:19 -07:00
Kenneth Graunke b99ad7f02c i965: Remove BRW_NEW_WM_INPUT_DIMENSIONS dirty bit.
This was only produced by the brw_wm_input_dimensions atom, which was
removed in the previous commit.  So there's no need for the dirty bit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-04-04 15:38:19 -07:00
Kenneth Graunke d198546bac i965: Delete brw_vs_constval.c and the brw_wm_input_sizes atom.
This was only used to compute proj_attrib_mask, which was removed by the
previous commit.  That makes this dead code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-04-04 15:38:19 -07:00
Kenneth Graunke 705c8247fa i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field.
The previous commit removed the last user of this field, so there's no
longer any point in setting it.  Removing this should eliminate
state-dependent recompiles, and make the precompile more reliable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-04-04 15:38:19 -07:00
Kenneth Graunke 7183568869 i965: Remove fixed-function texture projection avoidance optimization.
This optimization attempts to avoid extra attribute interpolation
instructions for texture coordinates where the W-component is 1.0.

Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes
state atom (all the brw_vs_constval.c code) needs to run on each draw.
It computes the input_size_masks array, then uses that to compute
proj_attrib_mask.  Differences in proj_attrib_mask can cause
state-dependent fragment shader recompiles.  We also often fail to guess
proj_attrib_mask for the fragment shader precompile, causing us to
needlessly compile it twice.

Furthermore, this optimization only applies to fixed-function programs;
it does not help modern GLSL-based programs at all.  Generally, older
fixed-function programs run fine on modern hardware anyway.

The optimization has existed in some form since the initial commit.  When
we rewrote the fragment shader backend, we dropped it for a while.  Eric
readded it in commit eb30820f26 as part of
an attempt to cure a ~1% performance regression caused by converting the
fixed-function fragment shader generation code from Mesa IR to GLSL IR.
However, no performance data was included in the commit message, so it's
unclear whether or not it was successful.

Time has passed, so I decided to re-measure this.  Surprisingly,
Eric's OpenArena timedemo actually runs /faster/ after removing this and
the brw_wm_input_sizes atom.  On Ivybridge at 1024x768, I measured a
1.39532% +/- 0.91833% increase in FPS (n = 55).  On Ironlake, there was
no statistically significant difference (n = 37).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-04-04 15:38:19 -07:00
Kenneth Graunke 32726b1af6 i965: Use ctx->Stencil._WriteEnabled in DEPTH_STENCIL_STATE.
This is the same computation as the _WriteEnabled flag, so we may as
well use it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-04-04 15:38:19 -07:00
Kenneth Graunke 01bd29d681 i965: Fix stencil write enable flag in 3DSTATE_DEPTH_BUFFER on Gen7+.
ctx->Stencil.WriteMask is a statically sized array of 3 elements.
Checking it against 0 actually is a NULL check, and can never fail,
which meant that we always said stencil writes were enabled.

Use the new core Mesa derived state flag to fix this.

NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-04-04 15:38:18 -07:00
Kenneth Graunke 1e3235d36e mesa: Add new ctx->Stencil._WriteEnabled derived state flag.
i965 needs to know whether stencil writes are enabled in several places,
and gets the test wrong sometimes.  While we could create a function to
compute this, it seems generally useful enough to warrant a new piece of
derived state.  Also, all the plumbing is already in place.

NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-04-04 15:38:18 -07:00
Roland Scheidegger 9eef86bb55 gallivm: some minor cube map cleanup
The ar_ge_as_at variable was just very very confusing since the condition
was actually the other way around (as_at_ge_ar). So change the condition
(and the selects depending on it) to match the variable name.
And also change the chosen major axis in case the coord values are the
same. OpenGL doesn't care one bit which one is chosen in this case but
it looks like dx10 would require z chosen over y, and y chosen over x
(previously did x chosen over y, y chosen over z). Since it's all the
same effort just honor dx10's wishes. (Though actually, for some prefered
orderings, we could save one (or two with derivatives) selects since the
tnewx and tnewz (and the corresponding dmax values) are the same.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 23:22:10 +02:00
Eric Anholt b6e9b54d06 i965: Ask the register allocator to round-robin through registers.
The way we were allocating registers before, packing into low register
numbers for Ironlake, resulted in an overly-constrained dependency graph
for instruction scheduling.  Improves GLBenchmark 2.1 performance by
4.5% +/- 0.7% (n=26).  No difference on my old GLSL demo (n=20).  No
difference on nexuiz (n=15).

v2: Fix off-by-one bug that made the change only work for 16-wide on i965.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-04 12:51:06 -07:00
Zack Rusin be9a42e980 llvmpipe: implement ucmp
and add a test for it

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 12:09:55 -07:00
Paul Berry 5db2249493 Avoid spurious GCC warnings in STATIC_ASSERT() macro.
GCC 4.8 now warns about typedefs that are local to a scope and not
used anywhere within that scope.  This produced spurious warnings with
the STATIC_ASSERT() macro (which used a typedef to provoke a compile
error in the event of an assertion failure).

This patch switches to a simpler technique that avoids the warning.

v2: Avoid GCC-specific syntax.  Also update p_compiler.h.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-04 09:52:18 -07:00
Erik Faye-Lund 456f40e18d freedreno: document debug flag
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
2013-04-04 10:41:50 -06:00
Brian Paul e95514c0ea st/wgl: add HUD support
v2: fix a few minor issues spotted by Jose.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 10:41:35 -06:00
Brian Paul 0c1dcf906d st/wgl: make stw_current_context() non-static
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 08:50:16 -06:00
Brian Paul 92e5e45ff1 util: add debug_memory_check_block(), debug_memory_tag()
The former just checks that the given block is valid by checking
the header and footer.

The later sets the memory block's tag.  With extra debug code, we
can use that for monitoring/checking particular allocations.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 08:50:15 -06:00
Brian Paul a408ea9692 gallium/hud: replace malloc w/ MALLOC
To match the FREE() called used later.  Fixes things on Windows.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-04 08:50:15 -06:00
Vincent Lejeune 9276961223 r600g/llvm: Workaround for wrong tex.offset_* 2013-04-04 16:03:04 +02:00
Roland Scheidegger ce5096a0a9 gallivm: honor explicit derivatives values for cube maps.
This is trivial now, though need to make sure we pass all the necessary
derivative values (which is 3 each for ddx/ddy not 2).
Passes piglit arb_shader_texture_lod-texgradcube test.

v2: add the forgotten abs() for all incoming derivatives (discovered
by new piglit arb_shader_texture_lod-texgradcube test, though more by
luck as it was failing only for exactly one pixel...).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger f621015cb5 gallivm: do per-pixel cube face selection (finally!!!)
This proved to be tricky, the problem is that after selection/mirroring
we cannot calculate reasonable derivatives (if not all pixels in a quad
end up on the same face the derivatives could get "randomly" exceedingly
large).
However, it is actually quite easy to simply calculate the derivatives
before selection/mirroring and then transform them similar to
the cube coordinates (they only need selection/projection, but not
mirroring as we're not interested in the sign bit, of course). While
there is a tiny bit more work to do (need to calculate derivs for 3
coords instead of 2, and additional selects) it also simplifies things
somewhat for the coord selection itself (as we save some broadcast aos
shuffles, and we don't need to calculate the average vector) - hence if
derivatives aren't needed this should actually be faster.
Also, this has the benefit that this will (trivially) work for explicit
derivatives too, which we completely ignored before that (will be in a
separate commit for better trackability).
Note that while the way for getting rho looks very different, it should
result in "nearly" the same values as before (the "nearly" is only because
before the code would choose the face based on an "average" vector and hence
the derivatives calculated according to this face, where now (for implicit
derivatives) the derivatives are projected on the face selected for the
first (top-left) pixel in a quad, so not necessarly the same face).
The transformation done might not quite be state-of-the-art, calculating
length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the
same as before (that is I think a better transform would _somehow_ take
the "derivative major axis" into account so that derivative changes in
the major axis wouldn't get ignored).
Should solve some accuracy problems with cubemaps (can easily be seen with
the cubemap demo when switching wrapping/filtering), though we still don't
do seamless filtering to fix it completely (so not per-sample but per-pixel
is certainly better than per-quad and already sufficient for accurate
results with nearest tex filter).

As for performance, it seems to be a tiny bit faster too (maybe 3% or so
with cubemap demo). Which I'd have expected with nearest/nearest filtering
where this will be less instructions, but the difference seems to actually
be larger with linear/linear_mipmap_linear where it is slightly more
instructions, probably the code appears less serialized allowing better
scheduling (on a sandy bridge cpu). It actually seems to be now at least
as fast as the old path using a conditional when using 128bit vectors too
(that is probably more a result of testing with a newer cpu though), for now
that old path is still there but unused.
No piglit regressions.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger bdfbeb9633 gallivm: minor rho calculation optimization for 1 or 3 coords
Using a different packing for the single coord case should save a shuffle.
Plus some minor style fixes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger 067a0ae420 gallivm: use f16c hw support for float->half and half->float conversion
Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-04 01:03:42 +02:00
Zack Rusin 302df7cc85 draw/llvmpipe: allow independent so attachments to the vs
When geometry shaders are present, one needs to be able to create
an empty geometry shader with stream output that needs to be
resolved later and attached to the currently bound vertex shader.
Lets add support for it to llvmpipe and draw. draw allows attaching
independent stream output info to any vertex shader and llvmpipe
resolves at draw time which vertex shader the given empty geometry
shader should be linked to.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 246e68735f llvmpipe: reset so buffers when not appending
We need to reset the internal state of the so buffers or we'll
keep appending even though we're not supposed to.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 7ca65a68e1 draw: remove unused function
we use draw_set_mapped_so_targets nowadays

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin b16ae0f792 draw/llvm: use an enum instead of magic numbers
I think this was there before and got accidently
removed during a merge. Same code as for the GS
context, which is also using an enum instead of
hardcoded numbers.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 49b7d933f8 draw/gs: cleanup some debugging code
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 822c21c776 draw/so: maintain an exact number of written vertices
It's quite helpful during the rendering when we know
exactly the count of the vertices available in the
buffer.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin d8543bd752 draw: Implement support for primitive id
We were largely ignoring primitive id.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin f6bfb62c50 draw/so: Fix bogus assert
We do support so with multiple primitives.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin e6fc635351 draw/gs: Fix memory corruption with multiple primitives
We were flushing with incorrect number of primitives. TGSI exec
can only work with a single primitive at a time. Plus the fetching
with multiple primitives on llvm paths wasn't copying the last
element.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin f313b0c850 gallivm: cleanup the gs interface
Instead of void pointers use a base interface.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Brian Paul ac114c6824 svga: add new memory-used HUD query
To track the amount of memory used by all pipe_resources (textures
and buffers).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Brian Paul a69efa9482 util: add new util_resource_size() function in u_resource.[ch]
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Brian Paul a3cccdec90 util: move functions from u_resource.c to u_transfer.c
The functions are prototyped in u_transfer.h and are related to the
other functions in u_transfer.c.

The next patch will re-use the u_resource.c file for new code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Vincent Lejeune 159d934066 r600g/llvm: Do not override llvm provided stack_size 2013-04-03 18:39:49 +02:00
Vincent Lejeune 097a6ecdfe r600g/llvm: Do not change cf_alu inst when adding alus 2013-04-03 18:22:40 +02:00
Marek Olšák ff01e0db0e radeonsi: add more cases for copying unsupported formats to resource_copy_region
Ported from r600g commit:

8891b2f9c9

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

NOTE: This is a candidate for the 9.1 branch.
2013-04-03 10:58:33 -04:00
Brian Paul 3838edaf5d svga: add HUD queries for number of draw calls, number of fallbacks
The fallbacks count is the number of drawing calls that use a "draw"
module fallback, such as polygon stipple.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 09:56:08 -06:00
Brian Paul 49ed1f3cb3 svga: refactor occlusion query code
This is in preparation for adding new query types for the HUD.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 09:56:07 -06:00
Brian Paul a9ae7e9c28 gallium/hud: try L8 texture for font if I8 format isn't supported 2013-04-03 09:44:57 -06:00
Brian Paul 0289ebaa0f svga: add case for PIPE_CAP_QUERY_PIPELINE_STATISTICS 2013-04-03 08:19:44 -06:00