Commit Graph

61525 Commits

Author SHA1 Message Date
Francisco Jerez d82b39ce38 clover: Allow storing a range into a container of different (but compatible) element type.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2014-02-21 12:51:22 +01:00
Francisco Jerez 1b9fb2fd91 clover: Define an intrusive smart reference class.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2014-02-21 12:51:22 +01:00
Francisco Jerez 9ae0bd3829 clover: Some improvements for the intrusive pointer class.
Define some additional convenience operators, clean up the
implementation slightly, and rename it to 'intrusive_ptr' for reasons
that will be obvious in the next commit.

Tested-by: Tom Stellard <thomas.stellard@amd.com>
2014-02-21 12:51:22 +01:00
Francisco Jerez 198cd136b9 clover: Fix up NULL constant pointer arguments.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2014-02-21 12:29:05 +01:00
Jordan Justen c97763ca2d tgsi_ureg: add property_gs_invocations
Fixes a build break in state_tracker/st_program.c

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75278
Reviewed-by: Dave Airlie <airlied@redhat.com>
2014-02-20 16:41:01 -08:00
Kenneth Graunke 1336ccb7dd i965: Enable Broadwell support.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:51:38 -08:00
Kenneth Graunke 808952a095 i965/fs: Implement FS_OPCODE_[UN]PACK_HALF_2x16_SPLIT[_XY] opcodes.
I'd neglected to port these to Broadwell.  Most of this code is copy
and pasted from Gen7, but instead of using F32TO16/F16TO32, we just
use MOV with HF register types.

Fixes fs-packHalf2x16 and fs-unpackHalf2x16 tests (both the ARB
extension and ES 3.0 variants).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:50:59 -08:00
Kenneth Graunke 850e372fc7 i965: Drop bogus F32TO16/F16TO32 instructions on Broadwell - use MOV.
Broadwell removed the F32TO16 and F16TO32 instructions.  However, it has
actual support for HF values, so they're actually just MOV.

Fixes vs-packHalf2x16 and vs-unpackHalf2x16 tests (both the ARB
extension and ES 3.0 variants).

v2: Emulate F32TO16's align16 zeroing bug, since Chad's front end code
    relies on it happening.  We can probably refactor this code to be
    better later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:50:57 -08:00
Kenneth Graunke 3663bbe773 i965: Create a hardware context before initializing state module.
brw_init_state() calls brw_upload_initial_gpu_state().  If hardware
contexts are enabled (brw->hw_ctx != NULL), this will upload some
initial invariant state for the GPU.  Without hardware contexts, we
rely on this state being uploaded via atoms that subscribe to the
BRW_NEW_CONTEXT bit.

Commit 46d3c2bf4d accidentally moved
the call to brw_init_state() before creating a hardware context.
This meant brw_upload_initial_gpu_state would always early return.
Except on Gen6+, we stopped uploading the initial GPU state via
state atoms, so it never happened.

Fixes a regression since 46d3c2bf4d.

Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:50:08 -08:00
Kenneth Graunke e3823147a5 i965/fs: Implement scratch read/write support for Broadwell.
To make sure that both the Gen4 and Gen7 style messages work, I
initially disabled the SHADER_OPCODE_GEN7_SCRATCH_READ optimization,
ran Piglit, re-enabled it, and ran Piglit again.  Both worked fine.

Fixes 40 Piglit tests (most of the varying-packing category).

v2: Move num_regs assertion from gen8_fs_generator to
    gen8_set_dp_scratch_message() (suggested by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:50:08 -08:00
Kenneth Graunke 29a6974403 i965: Add Gen8 assembly support for DP Scratch messages.
The new accessors will make it easy to do Gen7-style scratch messages.

v2: Move num_regs assertion from gen8_fs_generator into
    gen8_set_dp_scratch_message() (suggested by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:50:08 -08:00
Kenneth Graunke a5e54c91a3 i965: Store absolute thread count in max_wm_threads on Broadwell.
In the past, 3DSTATE_PS took an absolute number of threads.  Conversely,
on Broadwell you always program 64, and it implicitly scales based on
the GT-level with no special programming.  So, I stored 64 in
brw_device_info::max_wm_threads.

However, I didn't realize that we also use max_wm_threads to compute the
size of the scratch space buffer.  In that case, we really need the
absolute number of threads.

This patch hardcodes 3DSTATE_PS to use the value it expects, and changes
max_wm_threads back to a (completely fake) absolute thread count (once
again copied from Haswell).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:50:08 -08:00
Kenneth Graunke dca84b4b5b i965: Use MOV, not OR for setting URB write channel enables on Gen8+.
On Broadwell, g0.5 contains the "Scratch Space Pointer"; using OR
puts some bits of that into "ignored" sections of our message header.

While this doesn't hurt, it's also not terribly /useful/.  Using MOV
is sufficient to set the only interesting bits in this part of the
message header.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:50:07 -08:00
Kenneth Graunke e643c7d036 i965: Implement a CS stall workaround on Broadwell.
According to the latest documentation, any PIPE_CONTROL with the
"Command Streamer Stall" bit set must also have another bit set,
with five different options:

   - Render Target Cache Flush
   - Depth Cache Flush
   - Stall at Pixel Scoreboard
   - Post-Sync Operation
   - Depth Stall

I chose "Stall at Pixel Scoreboard" since we've used it effectively
in the past, but the choice is fairly arbitrary.

Implementing this in the PIPE_CONTROL emit helpers ensures that the
workaround will always take effect when it ought to.

Apparently, this workaround may be necessary on older hardware as well;
for now I've only added it to Broadwell as it's absolutely necessary
there.  Subsequent patches could add it to older platforms, provided
someone tests it there.

v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits
    are set (suggested by Ian Romanick).

v3: Prefix the function with "gen8" (requested by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-20 15:50:07 -08:00
Jordan Justen 741782b594 i965: support instanced GS on gen7
v3:
 * Properly prevent dual object mode execution when
   the invocation count > 1

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:09 -08:00
Jordan Justen 008338bc4e i965: support gl_InvocationID for gen7
v2:
 * Make gl_InvocationID a system value

v3:
 * Properly shift from R0.1 into DST.4 by adding
   GS_OPCODE_GET_INSTANCE_ID

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:09 -08:00
Jordan Justen d099019935 glsl: add gl_InvocationID variable for ARB_gpu_shader5
v2:
 * Make gl_InvocationID a system value

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:09 -08:00
Jordan Justen 22388e2208 main/shaderapi: GL_GEOMETRY_SHADER_INVOCATIONS GetProgramiv support
v3:
 * Add check for ARB_gpu_shader5

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:09 -08:00
Jordan Justen 86d6b5546b mesa: initialize gl_geometry_program Invocations field
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:09 -08:00
Jordan Justen 313402048f glsl/linker: produce gl_shader_program Geom.Invocations
Grab the parsed invocation count, check for consistency
during linking, and finally save the result in
gl_shader_program Geom.Invocations.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:08 -08:00
Jordan Justen 02dc74fbd7 glsl: parse invocations layout qualifier for ARB_gpu_shader5
_mesa_glsl_parse_state in_qualifier->invocations will store the
invocations count.

v3:
 * Use in_qualifier to allow the primitive to be specied
   separately from the invocations count (merge_qualifiers)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:08 -08:00
Jordan Justen 738c9c3c54 glsl: Generate error for invalid input layout declarations
Fixes various piglit tests:
spec/glsl-1.50/compiler/incorrect-in-layout-qualifier-*.geom

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:08 -08:00
Jordan Justen 0c558f9ee6 glsl: convert GS input primitive to use ast_type_qualifier
We introduce a new merge_in_qualifier ast_type_qualifier
which allows specialized handling of merging input layout
qualifiers.

By merging layout qualifiers into state->in_qualifier, we
allow multiple input qualifiers. For example, the primitive
type can be specified specified separately from the
invocations count (ARB_gpu_shader5).

state->gs_input_prim_type is moved into state->in_qualifier->prim_type

state->gs_input_prim_type_specified is still processed separately
so we can determine when the input primitive is specified. This
is important since certain scenerios are not supported until after
the primitive type has been specified in the shader code.

v4:
 * Merge with compute shader input layout qualifiers

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-20 10:33:08 -08:00
Eric Anholt 5bc0b2f432 i965: Fix extra return value after winsys rb update refactor.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75172
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-02-20 10:15:13 -08:00
Eric Anholt 9245206cbf i965/vs: Use samplers for UBOs in the VS like we do for non-UBO pulls.
Improves performance of a dolphin emulator trace I had laying around by
3.60131% +/- 0.995887% (n=128).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-02-20 10:15:13 -08:00
Eric Anholt 9e3cab8881 i965/fs: Add an optimization pass to remove redundant flags movs.
We generate steaming piles of these for the centroid workaround, and this
quickly cleans them up.

total instructions in shared programs: 1591228 -> 1590047 (-0.07%)
instructions in affected programs:     26111 -> 24930 (-4.52%)
GAINED:                                0
LOST:                                  0

(Improved apps are l4d2, csgo, and dolphin)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-02-20 10:15:13 -08:00
Roland Scheidegger b2b2a2c06c gallivm: add smallfloat to float conversion not relying on cpu denorm handling
The previous code relied on cpu denorm support for converting small float
formats (such r11g11b10_float and r16_float) to floats, otherwise denorms
are flushed to zero. We worked around that in llvmpipe blend code by
reenabling denorms, but this did nothing for texture sampling. Now it would
be possible to reenable it there too but I'm not really a fan of messing
with fpu flags (and it seems we can't actually do it reliably with llvm in
any case looking at some bug reports). (Not to mention if you actually have
a lot of denorms in there, you can expect some order-of-magnitude slowdown
with x86 cpus.)
So instead use code which adjusts exponents etc. directly hence not relying
on cpu denorm support for the rescaling mul.
(We still need the fpu flag handling as we can't do float-to-smallfloat
without using cpu denorms at least for now - I actually wanted to keep
both the old and new code and using one or the other depending on from where
it's called but that didn't work out as the parameter would have to be passed
through too many layers than I'd like.)

Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
2014-02-20 18:41:42 +01:00
Leo Liu 0206f0b3d4 st/omx/enc: add multi scaling buffers for performance improvement
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2014-02-20 13:34:16 +01:00
Christian König 754fa3a0d2 st/omx/dec/h264: fix prevFrameNumOffset handling
Signed-off-by: Christian König <christian.koenig@amd.com>
2014-02-20 13:34:06 +01:00
Kenneth Graunke 57405605a8 i965: Actually claim to support MSAA on Broadwell.
We need to advertise 8x, 4x, and 2x multisamples.  Previously, we only
claimed to support 0/1 samples.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:43:22 -08:00
Kenneth Graunke 4af8c95783 i965: Update physical width/height munging for 2x IMS MSAA.
I can't find any documentation to explain what ought to be done here, so
I simply guessed based on the pattern I observed in the 4x/8x cases.
It appears to work, but it could be totally wrong.

I was able to find the Sandybridge PRM quote from the comments in the
latest documentation: Shared Functions > 3D Sampler > Multisampled
Surface Behavior.  However, it only mentions 4x MSAA - not even 8x.

After a substantial amount more digging, I was able to find a second
page (incorrectly tagged) which confirmed the formulas in our code for
8x MSAA.  However, that page didn't mention 2x MSAA at all.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:43:22 -08:00
Kenneth Graunke 51145a24f7 i965: Enable smooth points when multisampling without point sprites.
According to the "Point Multisample Rasterization" of the OpenGL
specification (3.0 or later), smooth points are supposed to be enabled
implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag.

However, if GL_POINT_SPRITE is enabled, you get square points no matter
what.  Core contexts always enable point sprites, so this effectively
makes smooth points go away, even in the case of multisampling.

Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests.
(Yes, that's right folks, we actually have Piglit tests for this.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:43:22 -08:00
Kenneth Graunke a3d70580b5 i965: Thwack multisample enable bit in 3DSTATE_RASTER.
The meaning and effects of this bit are surprisingly complicated.

See Rasterization > Windower > Multisampling > Multisample ModesState.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:43:22 -08:00
Kenneth Graunke 0c5873c9b9 i965: Only use the SIMD16 program for per-sample shading on Broadwell.
This restriction carries forward from earlier platforms.  The code is
ported straight from gen7_wm_state.c.

v2: Actually do it right.
v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:42:54 -08:00
Kenneth Graunke 61d7ea4b16 i965: Set "Position XY Offset Select" bits in 3DSTATE_PS on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:42:16 -08:00
Kenneth Graunke 01c42b2be6 i965: Add missing sample shading bits to Gen8's 3DSTATE_PS_EXTRA.
v2: Also set the "oMask Present to Render Target" bit, which is required
    for shaders that write oMask.  Otherwise the hardware won't expect
    the extra data.

v3: Add missing _NEW_MULTISAMPLE (caught by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:42:02 -08:00
Kenneth Graunke 77c37ed74b i965/fs: Implement FS_OPCODE_SET_OMASK on Broadwell.
I made a few changes which I think simplify the code a bit compared to
the Gen7 implementation, but which are largely pointless.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:39:41 -08:00
Kenneth Graunke 5476da79f8 i965/fs: Implement FS_OPCODE_SET_SAMPLE_ID on Broadwell.
Largely cut and paste from Gen7; it works the same way.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:39:41 -08:00
Kenneth Graunke 80c4edfc27 i965: Disable MCS on Broadwell for now.
v2: Add a perf_debug() message to remind us to come back to this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:39:21 -08:00
Kenneth Graunke 4eba0d124d i965: Use gen7_surface_msaa_bits in Broadwell SURFACE_STATE code.
We already set the number of samples, but were missing the MSAA layout
mode.  Reusing gen7_surface_msaa_bits makes it easy to set both.

This also lets us drop the Gen8 surface_num_multisamples function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:35:54 -08:00
Kenneth Graunke 6eeae17c02 i965: Use ffs() for sample counting in gen7_surface_msaa_bits().
The enumerations are just log2(num_samples) shifted by 3, which we can
easily compute via ffs().

This also makes it reusable for Broadwell, which has 2x MSAA.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:35:53 -08:00
Kenneth Graunke 2ed5824a5d i965: Simplify Broadwell's 3DSTATE_MULTISAMPLE sample count handling.
These enumerations are simply log2 of the number of multisamples shifted
by a bit, so we can calculate them using ffs() in a lot less code.

Suggested by Eric Anholt.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:35:32 -08:00
Ian Romanick 7700c73cf4 glsl: Silence "type qualifiers ignored on function return type" warning
The const in

   const unsigned foo(void);

is meaningless.  Removing it silences this warning:

src/glsl/ast_to_hir.cpp:1802:56: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2014-02-19 15:08:50 -08:00
Ian Romanick 2c85fd5a96 glsl: Only warn for macro names containing __
From page 14 (page 20 of the PDF) of the GLSL 1.10 spec:

    "In addition, all identifiers containing two consecutive underscores
     (__) are reserved as possible future keywords."

The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos.  Names simply containing __ are dangerous to use, but should
be allowed.

Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
2014-02-19 15:08:50 -08:00
Ian Romanick 0bd7892630 glcpp: Only warn for macro names containing __
Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the
GLSL ES spec (all versions) say:

    "All macro names containing two consecutive underscores ( __ ) are
    reserved for future use as predefined macro names. All macro names
    prefixed with "GL_" ("GL" followed by a single underscore) are also
    reserved."

The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos.  Since every extension adds a name prefixed with GL_ (i.e.,
the name of the extension), that should be an error.  Names simply
containing __ are dangerous to use, but should be allowed.  In similar
cases, the C++ preprocessor specification says, "no diagnostic is
required."

Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
2014-02-19 15:08:50 -08:00
Tom Stellard a4c734297f configure: Use LLVM shared libraries by default
Linking with LLVM static libraries is easily broken by changes to
the llvm-config program or when LLVM adds, removes, or changes library
components.  Keeping up with these changes requires a lot of maintanence
effort to keep the build working on the master and stable branches.

Also, because of issues in the past LLVM static libraries, the release
manager is currently configuring with --with-llvm-shared-libs when
checking the build before release.  Enabling shared libraries by
default would allow the release manager to run ./configure with
no arguments, and be reasonably confident that the build would succeed.

Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-02-19 14:35:49 -05:00
Francisco Jerez 8928d7860a i965/fs: Allocate the param_size array dynamically.
Useful because the total number of uniform components might exceed
MAX_UNIFORMS * 4 in some cases because of the image metadata we'll be
passing as push constants.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2014-02-19 19:03:56 +01:00
Francisco Jerez eef710fc53 i965/fs: Use a separate variable to keep track of the last uniform index seen.
Like the VEC4 back-end does.  It will make dynamic allocation of the
param_size array easier in a future commit.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2014-02-19 19:03:56 +01:00
Rob Clark 9186cd39d4 freedreno: tweak ringbuffer sizes/count
Since we are now consuming two ringbuffers at a time, we probably want a
pool larger than 4.. but we don't need each individual ringbuffer to be
so large, so offset the pool size increase by reducing rb size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-19 12:02:57 -05:00
Rob Clark 5993723471 freedreno/a3xx/compiler: scheduling/legalize fixes
It seems the write-after-read hazard that applies to texture fetch
instructions, also applies to sfu instructions.

Also, cat5/cat6 instructions do not have a (ss) bit, so in these
cases we need to insert a dummy nop instruction with (ss) bit set.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-19 12:01:26 -05:00