Commit Graph

61967 Commits

Author SHA1 Message Date
Ilia Mirkin 51989817e6 loader: add special logic to distinguish nouveau from nouveau_vieux
There are a lot of different pci ids supported by nouveau, and more are
added all the time. The relevant distinguisher between drivers is the
chipset id.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
2014-03-19 18:17:40 -04:00
Matt Turner c049dd4396 glsl: Allow dot() on scalars, and throw out dotlike().
In all uses of dotlike() we're writing generic code that operates on 1-4
component vectors. That our IR requires ir_binop_dot expressions'
operands to be 2+ component vectors is an implementation detail that's
not important when implementing built-in functions with dot(), which is
defined for scalar floats in GLSL.

Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 23:20:29 -07:00
Matt Turner 6cbc64c3cb glsl: Optimize pow(x, 2) into x * x.
Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.

Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 23:20:29 -07:00
Matt Turner 9a9eaaa79a glsl: Match whitespace changes from previous patch.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 23:20:29 -07:00
Matt Turner 7988b4804f glsl: Expose pack/unpack built-ins for ARB_gpu_shader5.
ARB_gpu_shader5 and ES 3.0 expose different subsets of
ARB_shading_language_packing.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 23:20:29 -07:00
Eric Anholt 651b8baa82 i965: Drop some more dead code from the old CACHED_BATCH feature.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 14:45:09 -07:00
Eric Anholt 512c88f826 i965: Drop special case for edgeflag thanks to Marek's change to core.
As of 780ce576bb, we end up with R8_SSCALED
anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 14:45:09 -07:00
Brian Paul f4435da940 mesa: include stdbool.h in register_allocate.h to fix build
https://bugs.freedesktop.org/show_bug.cgi?id=76331
2014-03-18 13:28:17 -06:00
Ian Romanick f74cf5f80e i965: Enable EWA anisotropic filtering algorithm
Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA
approximation algorithm results in higher image quality than the legacy
algorithm."  Using a classic anisotropic filtering "tunnel" demo, it
appears that there is *no* anisotropic filtering on IVB without this bit
set.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 10:56:38 -07:00
Kenneth Graunke dd2e5d3999 i965: Actually initialize simd16_unsupported and no16_msg.
I meant to include this fixes in v3 of commit
de7ad2c88f, but accidentally pushed a
previous version.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-18 10:50:48 -07:00
Kenneth Graunke 91f4528da6 i965/upload: Refactor open-coded ALIGN-like computations.
Sadly, we can't use actual ALIGN(), since that only supports
power-of-two values for the alignment parameter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-03-18 10:39:04 -07:00
Kenneth Graunke b8b4e280b4 i965: Fix indentation in brw_upload_indices().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-03-18 10:38:48 -07:00
Kenneth Graunke 051edcc144 i965: Consolidate code for setting brw->ib.start_vertex_offset.
This was set identically in three places.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-03-18 10:38:44 -07:00
Kenneth Graunke 7a0fd3ca1d i965: Allocate register sets at screen creation, not context creation.
Register sets depend on the particular hardware generation, but don't
depend on anything in the actual OpenGL context.  Computing them is
fairly expensive, and they take up a large amount of memory.  Putting
them in the screen allows us to compute/allocate them once for all
contexts, saving both time and space.

Improves the performance of a context creation/destruction
microbenchmark by about 3x on my Haswell i7-4750HQ.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:35:53 -07:00
Kenneth Graunke b3e4b769dd i965: Allocate the screen using ralloc rather than calloc.
This will allow us to use the screen as a memory context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:31:12 -07:00
Eric Anholt 41097db91b ra: Convert another bool array to bitsets.
This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1,
with no performance difference on timing short shader-db runs (n=9/10,
warmup outlier removed).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-03-18 10:20:28 -07:00
Kenneth Graunke da1cce2d68 ra: Use a bitset for storing which registers belong to a class.
This should use 1/8 the memory.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
2014-03-18 10:15:24 -07:00
Kenneth Graunke 8d856c3937 ra: Create a reg_belongs_to_class() helper function.
This is a little easier to read.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
2014-03-18 10:15:23 -07:00
Kenneth Graunke 786a647245 ra: Use bool instead of GLboolean.
This isn't the GL API, so there's no reason to use GLboolean.

Using bool is safer: any non-zero value is treated as "true".  When
converting a value to a GLboolean, all but the low byte is discarded,
which means that values like 256 will be incorrectly rendered as false.

Done via the following vim commands:
:%s/GLboolean/bool/g
:%s/GL_TRUE/true/g
:%s/GL_FALSE/false/g
and one line of manual whitespace tidying.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-18 10:15:18 -07:00
Kenneth Graunke de7ad2c88f i965: Accurately bail on SIMD16 compiles.
Ideally, we'd like to never even attempt the SIMD16 compile if we could
know ahead of time that it won't succeed---it's purely a waste of time.
This is especially important for state-based recompiles, which happen at
draw time.

The fragment shader compiler has a number of checks like:

   if (dispatch_width == 16)
      fail("...some reason...");

This patch introduces a new no16() function which replaces the above
pattern.  In the SIMD8 compile, it sets a "SIMD16 will never work" flag.
Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and
issue a helpful performance warning if INTEL_DEBUG=perf is set.  (In
SIMD16 mode, no16() calls fail(), for safety's sake.)

The great part is that this is not a heuristic---if the flag is set, we
know with 100% certainty that the SIMD16 compile would fail.  (It might
fail anyway if we run out of registers, but it's always worth trying.)

v2: Fix missing va_end in early-return case (caught by Ilia Mirkin).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:11:38 -07:00
Kenneth Graunke b207e88b25 i965/fs: Support pull parameters in SIMD16 mode.
This is just a matter of reusing the pull/push constant information set
up by the SIMD8 compile.

This gains us 78 SIMD16 programs in shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:11:36 -07:00
Kenneth Graunke 229319e0f0 i965/fs: Use a single instance of the pull_constant_loc[] array.
Now that we don't renumber uniform registers, assign_constant_locations
and move_uniform_array_access_to_pull_constants use the same names.
So, they can share a single copy of the pull_constant_loc[] array.

This simplifies the code considerably.  assign_constant_locations()
doesn't need to walk through pull_params[] to rediscover reladdr
demotions; it just has that information in pull_constant_loc[].  We also
only need to rewrite the instruction stream once, instead of twice.

Even better, we now have a single array describing the layout of
all pull parameters, which we can pass to the SIMD16 program.

This actually hurts a few shaders in Serious Sam 3, and one in KWin:
total instructions in shared programs: 1841957 -> 1842035 (0.00%)
instructions in affected programs:     1165 -> 1243 (6.70%)
Comparing dump_instructions() before and after the pull constant
transformations with and without this patch, it appears that there is
a uniform array with variable indexing (reladdr) and constant indexing
(of array element 0).  Previously, we uploaded array element 0 as both
a pull constant (for reladdr) /and/ a push constant.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:11:32 -07:00
Kenneth Graunke 542f2e47f2 i965/fs: Don't renumber UNIFORM registers.
Previously, remove_dead_constants() would renumber the UNIFORM registers
to be sequential starting from zero, and the resulting register number
would be used directly as an index into the params[] array.

This renumbering made it difficult to collect and save information about
pull constant locations, since setup_pull_constants() and
move_uniform_array_access_to_pull_constants() used different names.

This patch generalizes setup_pull_constants() to decide whether each
uniform register should be a pull constant, push constant, or neither
(because it's unused).  Then, it stores mappings from UNIFORM register
numbers to params[] or pull_params[] indices in the push_constant_loc
and pull_constant_loc arrays.  (We already did this for pull constants.)

Then, assign_curb_setup() just needs to consult the push_constant_loc
array to get the real index into the params[] array.

This effectively folds all the remove_dead_constants() functionality
into assign_constant_locations(), while being less irritable to work
with.

v2: Add assert(remapped <= i), requested by Topi.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:11:29 -07:00
Kenneth Graunke d9f339eccd i965/fs: Split pull parameter decision making from mechanical demoting.
move_uniform_array_access_to_pull_constants() and setup_pull_constants()
both have two parts:

1. Decide which UNIFORM registers to demote to pull constants, and
   assign locations.
2. Mechanically rewrite the instruction stream to pull the uniform
   value into a temporary VGRF and use that, eliminating the UNIFORM
   file access.

In order to support pull constants in SIMD16 mode, we will need to make
decisions exactly once, but rewrite both instruction streams.
Separating these two tasks will make this easier.

This patch introduces a new helper, demote_pull_constants(), which
takes care of rewriting the instruction stream, in both cases.

For the moment, a single invocation of demote_pull_constants can't
safely handle both reladdr and non-reladdr tasks, since the two callers
still use different names for uniforms due to remove_dead_constants()
remapping of things.  So, we get an ugly boolean parameter saying
which to do.  This will go away.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:11:26 -07:00
Kenneth Graunke 2163e0fd5a i965/fs: Record pull constant locations for all array elements.
When demoting a variably indexed uniform array to pull constants, we
only recorded the location for the base of the array (element 0).

Recording locations for all array elements is a trivial amount of code
and will make subsequent refactoring easier.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:11:24 -07:00
Kenneth Graunke 7c7627781f i965/fs: Save push constant location information.
Previously, both move_uniform_array_access_to_pull_constants() and
setup_pull_constants() maintained stack-local arrays with this
information.  Storing this information will allow it to be used from
multiple functions, allowing us to split and move code around.

We'll also eventually want to pass pull constant location information
to the SIMD16 compile.  Saving this information will help us do that.

Unfortunately, the two functions *cannot* share the contents of the
array just yet.  remove_dead_constants() renumbers all the UNIFORM
registers to be contiguous starting at zero, so the two functions
talk about uniforms using different names.  We can't even remap them,
since move_uniform_array_access_to_pull_constants() deletes UNIFORM
registers that are only accessed with reladdr, so remove_dead_constants
can't even see them.

This situation will improve in the next few patches.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:11:21 -07:00
Kenneth Graunke de77efde91 i965/fs: Delete dead code to fail compiles with SIMD16 pull parameters.
The SIMD8 compile will determine whether pull parameters are necessary.
If so, it will set prog_data->nr_pull_params to a value greater than 0.

brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile
altogether.  So, this code should never occur.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-03-18 10:11:08 -07:00
Brian Paul 63e7b51912 gallium/docs: update SLT, SGE, SFL, STR opcode docs
To emphasize that the result is floating point 1.0 or 0.0, to match
other opcodes like SLE and SEQ.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-03-18 08:03:27 -06:00
Charmaine Lee 81f342ce64 glx: Fix incorrect pdp assignment in dri2_bind_context().
pdp should be set to dpyPriv->dri2Display.
Fixes blank frame failure running glretrace ClearView.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-03-18 08:03:27 -06:00
Maarten Lankhorst 8fe888fafd nvc0: Handle user mapped vertex buffer for edgeflag
Handle mapping edgeflag data similar to the code around it.
This fixes a crash in piglit test gl-2.0-edgeflag.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
2014-03-18 14:51:06 +01:00
Francisco Jerez d70ad1a4f9 clover: Fix region size error checking in some buffer transfer commands.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2014-03-18 12:14:46 +01:00
Ilia Mirkin c8309cde30 nv50/ir/gk110: add postfactor support for fmul
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:55 -04:00
Ilia Mirkin d8e0d1e882 nv50/ir/gk110: set not modifier on first source of logic op
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:55 -04:00
Ilia Mirkin b56e50b8af nv50/ir/gk110: use shl/shr instead of lshf/rshf so that c[] is supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:55 -04:00
Ilia Mirkin 34bf5e27c6 nv50/ir/gk110: add 64/128-bit fetch/export support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:55 -04:00
Ilia Mirkin 3c40be2615 nv50/ir/gk110: fix handling of OP_SUB for floating point ops
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin 72310869f0 nv50/ir/gk110: presin/preex2 take their source at bit 23
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin 48a9ba63f5 nv50/ir/gk110: add implementations of div u32/s32
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin 4bb14aca29 nv50/ir/gk110: implement quadop
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin 67cb8a6996 nv50/ir/gk110: fill in mov from predicate
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin 563083ef57 nv50/ir/gk110: handle derivAll flag, fix useOffsets for non-txf
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin ece734b3c1 nv50/ir/gk110: fix setting texture for txd/txf/txq
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin 08505549ab nv50/ir/gk110: add texcsaa implementation
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin c17f7247ec nv50/ir/gk110: add pfetch support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:54 -04:00
Ilia Mirkin 15b1f420d0 nv50/ir/gk110: add emit/restart implementations
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:53 -04:00
Ilia Mirkin 1b68009466 nv50/ir/gk110: add missing break in sched emit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:53 -04:00
Ilia Mirkin 76554d2d1f nv50/ir/gk110: implement partial txq support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:53 -04:00
Ilia Mirkin cb3dcb1430 nv50/ir/gk110: fill out texture instruction support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:53 -04:00
Ilia Mirkin ce75a3e8d3 nv50/ir/gk110: fix control flow opcode emission, add sat flag
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-03-18 05:56:34 -04:00
Chad Versace 468cc866b4 egl/main: Enable Linux platform extensions
Enable EGL_EXT_platform_base and the Linux platform extensions layered
atop it: EGL_EXT_platform_x11, EGL_EXT_platform_wayland,
and EGL_MESA_platform_gbm.

Tested with Piglit's EGL_EXT_platform_base tests under an X11 session.
To enable running the Wayland and GBM tests, windowed Weston was running
and the kernel had render nodes enabled.

I regression tested my EGL_EXT_platform_base patch set with Piglit on
Ivybridge under X11/EGL, standalone Weston, and GBM with rendernodes. No
regressions found.

Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-03-17 15:49:06 -07:00