Commit Graph

66569 Commits

Author SHA1 Message Date
Axel Davy 104b5a8193 st/nine: clean device9ex.
Pass ex specific parameters as arguments to device9 ctor instead
of passing them by filling the structure.

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2014-11-26 20:09:10 +00:00
Emil Velikov 9b7037a369 nine: the .pc file should not follow mesa version
The version provided by it should be the same as the one
provided/handled by the module. Add the missing tiny version.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
2014-11-26 20:09:10 +00:00
Emil Velikov c642e87d9f auxiliary/vl: rework the build of the VL code
Rather than shoving all the VL code for non-VL targets, increasing
their size, just split it out and use it when needed. This gives us
the side effect of building vl_winsys_dri.c once, dropping a few
automake warnings, and reducing the size of the dri modules as below

   text    data     bss     dec     hex filename
5850573  187549 1977928 8016050  7a50b2 before/nouveau_dri.so
5508486  187100  391240 6086826  5ce0aa after/nouveau_dri.so

The above data is for a nouveau + swrast + kms_swrast 'megadriver'.

v2: Do not include the vl sources in the auxiliary library.
v3: Rebase. Add nine.

Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-11-26 20:09:09 +00:00
Emil Velikov 86a51eb861 auxiliary/vl: split the vl sources list into VL_SOURCES
With follow up commit we'll split vl static lib from the auxiliary one,
and choose the appropriate vl (galliumvl or galliumvl_stub) for the
respective targets to link against.

v2: Rebase.

Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-11-26 20:09:09 +00:00
Emil Velikov f093c1c8ec auxiliary/vl: add galliumvl_stub.la
Will be used by the non-VL targets, to stub out the functions called
by the drivers. The entry point to those are within the VL
state-trackers, yet the compiler cannot determine that at link time.
Thus we'll need to stub them out to prevent unresolved symbols in the
dri, egl, gbm and pipe-loader targets.

v2: Rebase.

Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-11-26 20:09:09 +00:00
Emil Velikov 2dbaedaf10 automake: rework VL dependency tracking
Set a single VL_{CFLAG,LIBS} for xcb and friends, and let each target
check for it's relevant library alone. Required as with follow up
commits we'll build aux/vl into a separate module, which needs VL_CFLAGS

Cleanup add a couple of explicit LIBDRM_LIBS linking, as aux/vl itself
requires libdrm, despite that LIBDRM_{RADEON,NOUVEAU...} may provide it
as well.

v2: Rebase. Make sure st/xvmc programs work.

Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-11-26 20:08:40 +00:00
Emil Velikov 303bc3609a configure: check the package version when auto-detecting the VL targets
Or we might end up where automatically enable the build, only to error
out a couple of lines after that.

Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-11-26 20:08:39 +00:00
Siavash Eliasi 8dc8c496e1 mesa: Permanently enable features supported by target CPU at compile time.
This will remove the need for unnecessary runtime checks for CPU features if
already supported by target CPU, resulting in smaller and less branchy code.

V2:
- Removed the SSSE3 related part for the not yet merged patch.
- Avoiding redefinition of macros.

Tested-by: David Heidelberg <david@ixit.cz>
2014-11-26 20:08:38 +00:00
Emil Velikov 752c2e9690 docs: add relnotes template for 10.5.0
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-11-26 18:00:17 +00:00
Timothy Arceri b3721cd230 util: update hash type comments
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-11-26 20:04:13 +11:00
Matt Turner 531feec9dc i965/vec4: Handle destination writemasks in VEC4_OPCODE_PACK_BYTES.
Since pack_bytes expands to two mov(4) align1 instructions, we can't use
swizzles directly. For an instruction like

   pack_bytes m4.y:UD, vgrf13.xyzw:UD

we can write into the .y component by settings the offset based on the
swizzle.

Also while we're doing this, we can set the dependency control hints
properly, so that a series of pack_bytes writing into separate
components of a register can issue without blocking.
2014-11-25 17:29:02 -08:00
Matt Turner 70fcd56538 i965/vec4: Optimize packSnorm4x8().
Reduces the number of instructions needed to implement packSnorm4x8()
from 13 -> 7.
2014-11-25 17:29:02 -08:00
Matt Turner 3532be7680 i965/vec4: Optimize packUnorm4x8().
Reduces the number of instructions needed to implement packUnorm4x8()
from 11 -> 6.
2014-11-25 17:29:02 -08:00
Matt Turner e14c7c7faf i965/vec4: Add VEC4_OPCODE_PACK_4_BYTES.
Will be used by emit_pack_{s,u}norm_4x8().
2014-11-25 17:29:02 -08:00
Matt Turner 94a30bbd4f i965/vec4: Optimize unpackSnorm4x8().
Reduces the number of instructions needed to implement unpackSnorm4x8()
from 16 -> 6.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-11-25 17:29:02 -08:00
Matt Turner bf686b2785 i965/vec4: Optimize unpackUnorm4x8().
Reduces the number of instructions needed to implement unpackUnorm4x8()
from 11 -> 4.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-11-25 17:29:02 -08:00
Matt Turner cb0ba848d4 i965/vec4: Add vector float immediate infrastructure.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-11-25 17:29:02 -08:00
Matt Turner 5d23721c1d i965/fs: Add vector float immediate infrastructure.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-11-25 17:29:02 -08:00
Matt Turner 276075f864 i965: Disassemble vector float immediates properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2014-11-25 17:29:02 -08:00
Matt Turner b2abf033e0 i965: Add unit test for float <-> VF conversions.
Using Eric's original VF -> float conversion code to initialize the
table.
2014-11-25 17:29:02 -08:00
Matt Turner c37d798e78 i965: Add functions to convert float <-> VF.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-11-25 17:29:01 -08:00
Chris Forbes 0008d0e59e i965/Gen6-7: Do not replace texcoords with point coord if not drawing points
Fixes broken rendering in Windows-based QtQuick2 apps run through Wine.
This library sets all texture units' GL_COORD_REPLACE, leaves point
sprite mode enabled, and then draws a triangle fan.

Will need a slightly different fix for Gen4-5, but I don't have my old
machines in a usable state currently.

V2: - Simplify patch -- the real changes are no longer duplicated across
      the Gen6 and Gen7 atoms.
    - Also don't clobber attr overrides -- which matters on Haswell too,
      and fixes the other half of the problem
    - Fix newly-introduced warnings
V3: - Use BRW_NEW_GEOMETRY_PROGRAM and brw->geometry_program rather than
      core flag and state; keep the state flags in order.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84651
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-11-25 22:38:32 +13:00
Kenneth Graunke 60f011af1a glsl: Make lower_constant_arrays_to_uniforms require dereferences.
Ilia noticed that my lowering pass was converting the constant array
used by textureGatherOffsets' offsets parameter to a uniform.  This
broke textureGather for Nouveau, and is generally a horrible plan,
since it violates the GLSL constraint that offsets must be an
immediate constant.

When I wrote this pass, I neglected to consider whole array assignment.
I figured opt_array_splitting would handle constant indexing, so this
pass was really about fixing variable indexing.

textureGatherOffsets is an example of whole array access that we really
don't want to touch.  Whole array copies don't appear to benefit from
this either - they're most likely initializers for temporary arrays
which are going to be mutated anyway.  Since you're copying, you may
as well copy from immediates, not uniforms.

This patch makes the pass look for ir_dereference_arrays of
ir_constants, rather than looking for any ir_constant directly.
This way, it ignores whole array assignment.

No shader-db changes or Piglit regressions on Haswell.  Some Piglit
tests generate different code (fixing textureGatherOffsets on Nouveau).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2014-11-24 15:30:09 -08:00
Kenneth Graunke f0c91f32c0 i965: Precompile ARB programs.
We already precompile GLSL programs; it seems logical to precompile ARB
programs as well.  We just never hooked it up.

This also makes the programs compile even if no drawing occurs, which is
useful for shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-11-24 15:30:09 -08:00
Kenneth Graunke b55777f39d i965: Make precompile functions accessible from C.
Previously, the prototypes for brw_vs/gs/fs_precompile were scattered
between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only).  Also,
brw_fs_precompile had C++ linkage, while the others were C.

This patch moves all the prototypes to a central location (brw_shader.h)
and makes brw_fs_precompile have C linkage.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-11-24 15:30:09 -08:00
Kenneth Graunke 62b425448c i965: Pass gl_program pointers into precompile functions.
We'd like to do precompiling for ARB vertex and fragment programs,
which only have gl_program structures - gl_shader_program is NULL.

This patch makes the various precompile functions take a gl_program
parameter directly, rather than accessing it via gl_shader_program.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-11-24 15:30:09 -08:00
Kenneth Graunke d54925df9c i965: Move brw->precompile checks out a level.
brw_shader_precompile should just do a precompile; it makes more sense
for the caller to decide whether we should do one.  Simpler.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-11-24 15:30:09 -08:00
Roland Scheidegger 880424b8ad llvmpipe: (trivial) remove redundant util_cpu_detect() call in lp_test_main
Already called earlier.
2014-11-25 00:29:29 +01:00
Roland Scheidegger 8148a06b8f llvmpipe: fix lp_test_arit denorm handling
llvmpipe disables denorms on purpose (on x86/sse only), because denorms are
generally neither required nor desired for graphic apis (and in case of d3d10,
they are forbidden).
However, this caused some arithmetic tests using denorms to fail on some
systems, because the reference did not generate the same results anymore.
(It did not fail on all systems - behavior of these math functions is sort
of undefined when called with non-standard floating point mode, hence the
result differing depending on implementation and in particular the sse
capabilities.)
So, for the reference, simply flush all (input/output) denorms manually
to zero in this case.

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=67672.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-11-25 00:29:29 +01:00
Eric Anholt 93d30ff5d6 nouveau: Fix build after STR/BRA opcode dropping.
I missed these while git grepping for users of the dead opcodes.  Sigh,
macros.
2014-11-24 15:22:25 -08:00
Eric Anholt a3688d686f mesa: Drop unused NV_fragment_program opcodes.
The extension itself was deleted 2 years ago.  There are still some
prog_instruction opcodes from NV_fp that exist because they're used by
ir_to_mesa.cpp, though.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Roamnick <ian.d.romanick@intel.com>
2014-11-24 14:56:22 -08:00
Eric Anholt 868f95f1da mesa: Drop unused SFL/STR opcodes.
They're part of NV_vertex_program2, which I'm pretty sure we're never
going to support.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Roamnick <ian.d.romanick@intel.com>
2014-11-24 14:56:22 -08:00
Eric Anholt 365a4a3f9a gallium: Drop the unused CND opcode.
Nothing in the tree generates it.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-11-24 14:56:22 -08:00
Eric Anholt 00f7002c5c gallium: Drop unused BRA opcode.
Never generated, and implemented in only nvfx vertprog.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-11-24 14:56:22 -08:00
Eric Anholt ecfe9e2ad2 gallium: Drop the unused SFL/STR opcodes.
Nothing generated them.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-11-24 14:56:22 -08:00
Eric Anholt dc00b382b5 gallium: Drop the unused RFL opcode.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-11-24 14:56:22 -08:00
Eric Anholt 8c822b1e91 gallium: Drop unused X2D opcode.
Nothing in the tree generates it.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-11-24 14:56:22 -08:00
Eric Anholt ff886c4955 gallium: Drop the unused ARA opcode.
Nothing in the tree generated it.

v2: Only drop ARA, not ARR as well.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v2)
2014-11-24 14:56:22 -08:00
Eric Anholt de2f8d75db gallium: Drop the unused RCC opcode.
Nothing in the tree generated it.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-11-24 14:56:22 -08:00
Eric Anholt d4864cdf15 gallium: Drop the NRM and NRM4 opcodes.
They weren't generated in tree, and as far as I know all hardware had to
lower it to a DP, RSQ, MUL.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-11-24 14:56:22 -08:00
Eric Anholt 7361d5ba63 ilo: Drop the explicit intialization of gaps in TGSI opcodes.
The nice thing about the good way of initializing arrays like this is that
you don't need to initialize everything in order, or even everything at
all.  Taking advantage of that only needs a tiny fixup to deal with the
default NULL value of the pointers.

I haven't dropped the initialization of opcodes that exist and are unsupported.
2014-11-24 14:56:22 -08:00
Eric Anholt 386c3fcb14 r300: Drop the "/* gap */" notes.
This switch statement's code structure isn't dependent on the numbers of
the opcodes at all.
2014-11-24 14:56:22 -08:00
Eric Anholt 2f01cc8417 r600: Drop the "/* gap */" notes.
These are obviously the gaps already, due to the bare numbers with
unsupported implementations.

This makes inserting new gaps less irritating.
2014-11-24 14:56:22 -08:00
Jose Fonseca 925cb75f89 nine: Drop use of TGSI_OPCODE_CND.
This was the only state tracker emitting it, and hardware was just having
to lower it anyway (or failing to lower it at all).

v2: Extracted from a larger patch by Jose (which also dropped DP2A), fixed
    to actually not reference TGSI_OPCODE_CND.  Change by anholt.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
2014-11-24 14:56:22 -08:00
Jose Fonseca 56fd7c6361 nine: Don't reference the dead TGSI_OPCODE_NRM.
The translation is lowering it to not using TGSI_OPCODE_NRM, anyway.

v2: Extracted from a larger patch by Jose that also dropped DP2A usage.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
2014-11-24 14:56:22 -08:00
Eric Anholt 7c0acd8535 nine: Don't use the otherwise-dead SFL opcode in an unreachable path.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
2014-11-24 14:56:21 -08:00
Matt Turner 057e6e5251 i965/gen6/gs: Don't declare a src_reg with struct.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-11-24 14:09:23 -08:00
Matt Turner ff966aff99 i965/disasm: Fix all32h/any32h predicate disassembly.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-11-24 14:09:23 -08:00
Matt Turner b754e52532 glsl: Fix tautological comparison.
Caught by clang.

warning: comparison of constant -1 with expression of type
         'ir_texture_opcode' is always false
      [-Wtautological-constant-out-of-range-compare]
      if (op == -1)
          ~~ ^  ~~

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-11-24 14:09:23 -08:00
Matt Turner 024db256d4 util: Prefer atomic intrinsics to inline assembly.
Cuts a little more than 1k of .text size from i915g.

This was previously done in commit 5f66b340 and subsequently reverted in
commit 3661f757 after bug 30514 was filed. I believe the cause of bug
30514 wasn't anything related to cross compiling, but rather that the
toolchain used defaulted to -march=i386, and i386 doesn't have the
CMPXCHG or XADD instructions used to implement the intrinsics.

So we reverted a patch that improved things so that we didn't break
compilation for a platform that never could have worked anyway.
2014-11-24 14:09:23 -08:00