Commit Graph

55885 Commits

Author SHA1 Message Date
Brian Paul 0c1dcf906d st/wgl: make stw_current_context() non-static
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 08:50:16 -06:00
Brian Paul 92e5e45ff1 util: add debug_memory_check_block(), debug_memory_tag()
The former just checks that the given block is valid by checking
the header and footer.

The later sets the memory block's tag.  With extra debug code, we
can use that for monitoring/checking particular allocations.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 08:50:15 -06:00
Brian Paul a408ea9692 gallium/hud: replace malloc w/ MALLOC
To match the FREE() called used later.  Fixes things on Windows.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-04 08:50:15 -06:00
Vincent Lejeune 9276961223 r600g/llvm: Workaround for wrong tex.offset_* 2013-04-04 16:03:04 +02:00
Roland Scheidegger ce5096a0a9 gallivm: honor explicit derivatives values for cube maps.
This is trivial now, though need to make sure we pass all the necessary
derivative values (which is 3 each for ddx/ddy not 2).
Passes piglit arb_shader_texture_lod-texgradcube test.

v2: add the forgotten abs() for all incoming derivatives (discovered
by new piglit arb_shader_texture_lod-texgradcube test, though more by
luck as it was failing only for exactly one pixel...).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger f621015cb5 gallivm: do per-pixel cube face selection (finally!!!)
This proved to be tricky, the problem is that after selection/mirroring
we cannot calculate reasonable derivatives (if not all pixels in a quad
end up on the same face the derivatives could get "randomly" exceedingly
large).
However, it is actually quite easy to simply calculate the derivatives
before selection/mirroring and then transform them similar to
the cube coordinates (they only need selection/projection, but not
mirroring as we're not interested in the sign bit, of course). While
there is a tiny bit more work to do (need to calculate derivs for 3
coords instead of 2, and additional selects) it also simplifies things
somewhat for the coord selection itself (as we save some broadcast aos
shuffles, and we don't need to calculate the average vector) - hence if
derivatives aren't needed this should actually be faster.
Also, this has the benefit that this will (trivially) work for explicit
derivatives too, which we completely ignored before that (will be in a
separate commit for better trackability).
Note that while the way for getting rho looks very different, it should
result in "nearly" the same values as before (the "nearly" is only because
before the code would choose the face based on an "average" vector and hence
the derivatives calculated according to this face, where now (for implicit
derivatives) the derivatives are projected on the face selected for the
first (top-left) pixel in a quad, so not necessarly the same face).
The transformation done might not quite be state-of-the-art, calculating
length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the
same as before (that is I think a better transform would _somehow_ take
the "derivative major axis" into account so that derivative changes in
the major axis wouldn't get ignored).
Should solve some accuracy problems with cubemaps (can easily be seen with
the cubemap demo when switching wrapping/filtering), though we still don't
do seamless filtering to fix it completely (so not per-sample but per-pixel
is certainly better than per-quad and already sufficient for accurate
results with nearest tex filter).

As for performance, it seems to be a tiny bit faster too (maybe 3% or so
with cubemap demo). Which I'd have expected with nearest/nearest filtering
where this will be less instructions, but the difference seems to actually
be larger with linear/linear_mipmap_linear where it is slightly more
instructions, probably the code appears less serialized allowing better
scheduling (on a sandy bridge cpu). It actually seems to be now at least
as fast as the old path using a conditional when using 128bit vectors too
(that is probably more a result of testing with a newer cpu though), for now
that old path is still there but unused.
No piglit regressions.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger bdfbeb9633 gallivm: minor rho calculation optimization for 1 or 3 coords
Using a different packing for the single coord case should save a shuffle.
Plus some minor style fixes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger 067a0ae420 gallivm: use f16c hw support for float->half and half->float conversion
Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-04 01:03:42 +02:00
Zack Rusin 302df7cc85 draw/llvmpipe: allow independent so attachments to the vs
When geometry shaders are present, one needs to be able to create
an empty geometry shader with stream output that needs to be
resolved later and attached to the currently bound vertex shader.
Lets add support for it to llvmpipe and draw. draw allows attaching
independent stream output info to any vertex shader and llvmpipe
resolves at draw time which vertex shader the given empty geometry
shader should be linked to.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 246e68735f llvmpipe: reset so buffers when not appending
We need to reset the internal state of the so buffers or we'll
keep appending even though we're not supposed to.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 7ca65a68e1 draw: remove unused function
we use draw_set_mapped_so_targets nowadays

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin b16ae0f792 draw/llvm: use an enum instead of magic numbers
I think this was there before and got accidently
removed during a merge. Same code as for the GS
context, which is also using an enum instead of
hardcoded numbers.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 49b7d933f8 draw/gs: cleanup some debugging code
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 822c21c776 draw/so: maintain an exact number of written vertices
It's quite helpful during the rendering when we know
exactly the count of the vertices available in the
buffer.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin d8543bd752 draw: Implement support for primitive id
We were largely ignoring primitive id.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin f6bfb62c50 draw/so: Fix bogus assert
We do support so with multiple primitives.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin e6fc635351 draw/gs: Fix memory corruption with multiple primitives
We were flushing with incorrect number of primitives. TGSI exec
can only work with a single primitive at a time. Plus the fetching
with multiple primitives on llvm paths wasn't copying the last
element.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin f313b0c850 gallivm: cleanup the gs interface
Instead of void pointers use a base interface.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Brian Paul ac114c6824 svga: add new memory-used HUD query
To track the amount of memory used by all pipe_resources (textures
and buffers).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Brian Paul a69efa9482 util: add new util_resource_size() function in u_resource.[ch]
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Brian Paul a3cccdec90 util: move functions from u_resource.c to u_transfer.c
The functions are prototyped in u_transfer.h and are related to the
other functions in u_transfer.c.

The next patch will re-use the u_resource.c file for new code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Vincent Lejeune 159d934066 r600g/llvm: Do not override llvm provided stack_size 2013-04-03 18:39:49 +02:00
Vincent Lejeune 097a6ecdfe r600g/llvm: Do not change cf_alu inst when adding alus 2013-04-03 18:22:40 +02:00
Marek Olšák ff01e0db0e radeonsi: add more cases for copying unsupported formats to resource_copy_region
Ported from r600g commit:

8891b2f9c9

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

NOTE: This is a candidate for the 9.1 branch.
2013-04-03 10:58:33 -04:00
Brian Paul 3838edaf5d svga: add HUD queries for number of draw calls, number of fallbacks
The fallbacks count is the number of drawing calls that use a "draw"
module fallback, such as polygon stipple.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 09:56:08 -06:00
Brian Paul 49ed1f3cb3 svga: refactor occlusion query code
This is in preparation for adding new query types for the HUD.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 09:56:07 -06:00
Brian Paul a9ae7e9c28 gallium/hud: try L8 texture for font if I8 format isn't supported 2013-04-03 09:44:57 -06:00
Brian Paul 0289ebaa0f svga: add case for PIPE_CAP_QUERY_PIPELINE_STATISTICS 2013-04-03 08:19:44 -06:00
Brian Paul 7e28debb6f st/mesa: rewrite comment in st_manager.c 2013-04-03 08:16:36 -06:00
Christoph Bumiller 80eef069f0 nv50,nvc0: remove MS resolve formats hack
Mesa now allows BlitFramebuffer resolve between RGBA and BGRA.
2013-04-03 13:19:15 +02:00
Christoph Bumiller 4de70bf43c nvc0: fix 128 bit compressed storage type selection 2013-04-03 12:54:44 +02:00
Christoph Bumiller 8e1dd58a7e nvc0: place staging textures in GART and map them directly 2013-04-03 12:54:44 +02:00
Christoph Bumiller ba9b0b682f nv50: account for pesky prefetch in size calculation of linear textures 2013-04-03 12:54:44 +02:00
Christoph Bumiller f0a0d59f0f nvc0: honour scaled coordiantes setting for linear textures 2013-04-03 12:54:44 +02:00
Christoph Bumiller d801545964 nvc0: fix for 2d engine R source formats writing RRR1 and not R001 2013-04-03 12:54:43 +02:00
Christoph Bumiller 6417d56c19 nv50,nvc0: disable DEPTH_RANGE_NEAR/FAR clipping during blit
We send position.z == 0, DEPTH_RANGE may be some arbitrary range
not including 0 (for exmaple in piglit's hiz tests).
2013-04-03 12:54:43 +02:00
Christoph Bumiller e45c969fe5 st/mesa: fix bitmap,drawpix,drawtex for PIPE_CAP_TGSI_TEXCOORD
NOTE: Changed the semantic index for the drawtex coordinate to
be the texture unit index instead of always 0.
Not sure if this is correct but since the value seems to depend
on the unit it would make sense to use different varying slots.
2013-04-03 12:54:43 +02:00
Christoph Bumiller 2a8145d36b nouveau: accelerate buffer copies in resource_copy_region 2013-04-03 12:54:43 +02:00
Christoph Bumiller 3ed4bbd769 nvc0: demagic some of the NVE4_COMPUTE_UPLOAD methods
It's actually the same as P2MF.
2013-04-03 12:54:43 +02:00
Christoph Bumiller fb0334adb3 nvc0: read PM counters for each warp scheduler separately 2013-04-03 12:54:43 +02:00
Christoph Bumiller 7bac075f25 nvc0: add some metrics to driver specific queries 2013-04-03 12:54:43 +02:00
Christoph Bumiller 198f514aa6 nvc0: add some driver statistics queries 2013-04-03 12:54:43 +02:00
Christoph Bumiller 7628cc247f nvc0: disable compressed storage type 0xdb for now
Single-sample color compression doesn't seem that useful anyway.
2013-04-03 12:54:43 +02:00
Christoph Bumiller ea12fc3f6c nvc0: use correct hw query for PRIMITIVES_GENERATED
It was the same as SO_STATISTICS[1] before.
2013-04-03 12:54:43 +02:00
Christoph Bumiller 6bca4e7085 nvc0: use fence to check state of queries that don't write sequence
This still isn't optimal, since the fence will signal a bit late,
but better than checking on the bo, which may never be ready if it
is shared (which is likely).
2013-04-03 12:54:43 +02:00
Christoph Bumiller 3d2790cead gallium/hud: add support for PIPE_QUERY_PIPELINE_STATISTICS
Also, renamed "pixels-rendered" to "samples-passed" because the
occlusion counter increments even if colour and depth writes are
disabled, or (on some implementations) for killed fragments that
passed the depth test when PS early_fragment_tests is set.
2013-04-03 12:54:43 +02:00
Christoph Bumiller c620aad71c gallium/docs: fix definition of PIPE_QUERY_SO_STATISTICS
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-03 12:54:43 +02:00
Christoph Bumiller f35e96d973 gallium: add PIPE_CAP_QUERY_PIPELINE_STATISTICS
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-03 12:54:43 +02:00
Paul Berry 41e4bccc75 i965: Reduce code duplication in handling of depth, stencil, and HiZ.
This patch consolidates duplicate code in the brw_depthbuffer and
gen7_depthbuffer state atoms.  Previously, these state atoms contained
5 chunks of code for emitting the _3DSTATE_DEPTH_BUFFER packet (3 for
Gen4-6 and 2 for Gen7).  Also a lot of logic for determining the
appropriate buffer setup was duplicated between the Gen4-6 and Gen7
functions.

This refactor splits the code into three separate functions:
brw_emit_depthbuffer(), which determines the appropriate buffer setup
in a mostly generation-independent way, brw_emit_depth_stencil_hiz(),
which emits the appropriate state packets for Gen4-6, and
gen7_emit_depth_stencil_hiz(), which emits the appropriate state
packets for Gen7.

Tested using Piglit on Gen5-7 (no regressions).

v2: Re-word some comments.  Fix an assertion that incorrectly
prohibited packed depth/stencil formats on Gen6 (these are allowed
provided that HiZ is disabled).

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-02 15:19:13 -07:00
Paul Berry 2ad0ed6349 Revert "glsl: Replace constant-index vector array accesses with swizzles"
This reverts commit dbf94d105a, which
was working around a bug in the handling of array indexing when
constant folding built-in functions.  Now that the constant folding
bug has been fixed, the workaround is no longer needed.
2013-04-02 12:24:16 -07:00