Merge the following upstream autoconf-archive patches.
ax_prog_flex: change grep syntax to accept e.g. "flex.real" in case a wrapper or symlink is used.
AX_PROG_FLEX: avoid use of grep empty string escape extension (fix for OpenBSD)
AX_PROG_FLEX: Also accept gflex.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jonathan Gray <jsg@openbsd.org>
Rather than building a new one every compile. This should reduce some
of the overhead of compiling shaders.
One consequence of this change is that we lose the MachineInstrs dumps
when dumping the shaders via R600_DEBUG. The LLVM IR and assembly is
still dumped, and if you still want to see the MachineInstr dump, you
can run the dumped LLVM IR through llc.
The "normal" detection (querying clflush size) already made sure it is
non-zero, however another method did not. This lead to crashes if this
value happened to be zero (apparently can happen in virtualized environments
at least).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87913
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The code used PIPE_ALIGN_VAR for the variable used by fxsave, however this
does not work if the stack isn't aligned. Hence use PIPE_ALIGN_STACK function
decoration to fix the segfault which can happen if stack alignment is only
4 bytes.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87658.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The headers hadn't been regenerated in a long time and had seen a number
of manual modifications. A few changes:
- remove nvc0_2d entirely, use the nv50 header which has the nvc0
values too
- remove 3ddefs, it's identical to the nv50 file
- move macros out into a separate file
Also the upstream rnndb changed the overall chip naming convention; this
was fixed up manually in the generated files until a better solution is
determined.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The headers hadn't been regenerated in a long time, and there were a few
minor divergences. Among other things, rnndb has changed naming to
G80/etc, for now I've not tackled switching that over and manually
replaced the nvidia codenames back to the chip ids. However no other
modifications of the headergen'd headers was done.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Compression seems to be supported for only some formats. Enable it for
those. Previously this was disabled for everything despite the code
looking like it was actually enabled.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
assert is compiled out in release builds - don't put logic into it. Note
that this particular instance is only used for vp debugging and is
normally compiled out.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
assert's get compiled out in release builds, so they can't be relied
upon to perform logic.
Reported-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Roy Spliet <rspliet@eclipso.eu>
Cc: "10.2 10.3 10.4" <mesa-stable@lists.freedesktop.org>
brw_swizzle_to_scs has been showing up in my CPU profiling, which is
rather silly - it's a tiny amount of code. It really should be inlined,
and can easily be implemented with fewer instructions.
The enum translation is as follows:
SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W, SWIZZLE_ZERO, SWIZZLE_ONE
0 1 2 3 4 5
4 5 6 7 0 1
SCS_RED, SCS_GREEN, SCS_BLUE, SCS_ALPHA, SCS_ZERO, SCS_ONE
which is simply (swizzle + 4) & 7.
Haswell needs extra textureGather workarounds to remap GREEN to BLUE,
but Broadwell and later do not.
This patch replicates swizzle_to_scs in gen7_wm_surface_state.c and
gen8_surface_state.c, since the Gen8+ code can be simplified to a mere
two instructions. Both copies can be marked static for easy inlining.
v2: Put the commit message in the code as comments (requested by
Jason Ekstrand). Also fix a typo.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Valve games use GL_SRGB8 textures. Instead of supporting that properly,
we fell back to MESA_FORMAT_R8G8B8A8_SRGB (with an alpha channel), which
meant that we had to use texture swizzling to override the alpha to 1.0
when sampling. This meant shader recompiles on Gen < 7.5 platforms.
By supporting MESA_FORMAT_R8G8B8X8_SRGB, the hardware just returns 1.0
for us, so we can just use SWIZZLE_XYZW, and avoid any recompiles. All
generations of hardware have supported the format for sampling and
filtering; we can easily support rendering by using the R8G8B8A8_SRGB
format and writing garbage to the X channel. (We do this already for
the non-SRGB version of this format.)
This removes all remaining shader recompiles in a time demo of "Counter
Strike: Global Offensive" (32 -> 0) on Sandybridge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The logic in brw_blorp_surface_info::set uses brw_format_for_mesa_format
for source surfaces, and brw->render_target_format[] for destination
surfaces. We should do the same in the sRGB MSAA overrides.
Currently, this isn't a problem, since SRGB MSAA buffers are all RGBA.
The next commit will introduce RGBX SRGB MSAA buffers, at which point
we need to get the RGBX -> RGBA format overrides for rendering right.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
ir_to_mesa does this - apparently we just forgot or something.
Without this, we'll guess the wrong texture swizzle (XYZW for color
instead of XXX1 for depth) when doing precompiles.
This cuts 26 shader recompiles in a time demo of "Counter Strike:
Global Offensive" (58 -> 32) on Sandybridge. Haswell still has 0
recompiles.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Gen7.5+ platforms that support the "Shader Channel Select" feature leave
key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE
is GL_ALPHA (which is really uncommon). So, the precompile should leave
them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well.
We didn't notice this because prog->ShadowSamplers is not set correctly.
The next patch will fix that problem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
According to the documentation, we need to do a CS stall on every fourth
PIPE_CONTROL command to avoid GPU hangs. The kernel does a CS stall
between batches, so we only need to count the PIPE_CONTROLs in our batches.
v2: Get the generation check right (caught by Chris Wilson),
combine the ++ with the check (suggested by Daniel Vetter).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This fixes the new piglit test: arb_uniform_buffer_object/2-buffers-bug
Cc: 10.2 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
There are too many state flags to fit in one terminal screen, even with
a very tall terminal. Everything is flagged once, so a value of 1 means
that it hasn't ever happened again, and thus isn't terribly interesting.
Skipping those makes it easier to see the interesting values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In order to support calling opt_vector_float() inside a condition, this
patch makes OPT() a statement expression:
https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
We've used that elsewhere already.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The mad instruction emitter already supported the saturate modifier,
but the ModifierFolding pass never tried folding cvt sat operations
in for NV50.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is a partial revert of c89306983c.
It split the {start,base}_vertex_location handling into several steps:
1. Set brw->draw.start_vertex_location = prim[i].start
and brw->draw.base_vertex_location = prim[i].basevertex.
(This happened once per _mesa_prim, in the main drawing loop.)
2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset
appropriately. (This happened in brw_prepare_shader_draw_parameters,
which was called just after brw_prepare_vertices, as part of state
upload, and only happened when BRW_NEW_VERTICES was flagged.)
3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim).
If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on
the second (or later) primitives, we would do step #1, but not #2.
The first _mesa_prim would get correct values, but subsequent ones
would only get the first half of the summation.
The reason I originally did this was because I needed the value of
gl_BaseVertexARB to exist in a buffer object prior to uploading
3DSTATE_VERTEX_BUFFERS. I believed I wanted to upload the value
of 3DPRIMITIVE's "Base Vertex Location" field, which was computed
as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) +
brw->vb.start_vertex_bias. The latter value wasn't available until
after brw_prepare_vertices, and the former weren't available in the
state upload code at all. Hence the awkward split.
However, I believe that including brw->vb.start_vertex_bias was a
mistake. It's an extra bias we apply when uploading vertex data into
VBOs, to move [min_index, max_index] to [0, max_index - min_index].
>From the GL_ARB_shader_draw_parameters specification:
"<gl_BaseVertexARB> holds the integer value passed to the <baseVertex>
parameter to the command that resulted in the current shader
invocation. In the case where the command has no <baseVertex>
parameter, the value of <gl_BaseVertexARB> is zero."
I conclude that gl_BaseVertexARB should only include the baseVertex
parameter from glDraw*Elements*, not any internal biases we add for
optimization purposes.
With that in mind, gl_BaseVertexARB only needs prim[i].start or
prim[i].basevertex. We can simply store that, and go back to computing
start_vertex_location and base_vertex_location in brw_emit_prim(), like
we used to. This is much simpler, and should actually fix two bugs.
Fixes missing geometry in Unvanquished.
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This makes it show up via ARB_debug_output and is also less code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2: Conditionalize it on having done any uploads (Turns out
u_upload_destroy() isn't safe with a NULL arg).
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
Fixes the piglits which check that gl_VertexID includes the base vertex
offset:
arb_draw_indirect-vertexid elements
gl-3.2-basevertex-vertexid
Note that this leaves out the original G80, for which this will continue
to fail. It could be fixed by passing a driver constbuf value in, but
that's beyond the scope of this change.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
LAST_LINE_PIXEL has actually been renamed to PIXEL_CENTER_INTEGER in
rnndb; use that method to implement the rasterizer setting, used for
st/nine.
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>