Commit Graph

75929 Commits

Author SHA1 Message Date
Matt Turner 1dc312e295 i965/fs: Implement support for extract_word.
The vec4 backend will lower it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 68f8c5730b nir: Add opcodes to extract bytes or words.
The uint versions zero extend while the int versions sign extend.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 8709dc0713 glsl: Remove 2x16 half-precision pack/unpack opcodes.
i965/fs was the only consumer, and we're now doing the lowering in NIR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 1a53a4fc7a i965/fs: Switch from GLSL IR to NIR for un/packHalf2x16 scalarizing.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 9ce901058f nir: Add lowering of nir_op_unpack_half_2x16.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner e4278a847e i965: Make separate nir_options for scalar/vector stages.
We'll want to have different lowering options set for scalar/vector
stages.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 252d497d4c i965: Move brw_compiler_create() to new brw_compiler.c.
A future patch will want to use designated initalizers, which aren't
available in C++, but this is C.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 140a886c41 nir: Make argument order of unop_convert match binop_convert.
Strangely the return and parameter types were reversed.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Marta Lofstedt 77a60ab5dc mesa: enable enums for OES_geometry_shader
Enable GL_OES_geometry_shader enums for OpenGL ES 3.1.

V4: EXTRA tokens updated according to comments from Ilia Mirkin.

V5: Account for check_extra does not evaluate "or" lazy. Fix issues
with EXTRA_EXT_FB_NO_ATTACH_CS.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-01 09:30:50 +01:00
François Tigeot a48afb92ff gallium: Add DragonFly support
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-31 11:56:09 +00:00
Ilia Mirkin 7f19e29305 nv50/ir: get rid of memory stores with nop values
This happens especially with exports and varying packing, where the last
bits aren't always filled in. We end up trying to do quad-wide stores,
which ends up being a lot of register moves that carefully preserve the
nop value. Instead don't do the stores.

total instructions in shared programs : 6131375 -> 6125267 (-0.10%)
total gprs used in shared programs    : 910139 -> 895501 (-1.61%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst
    helped           0        7442        4693
      hurt           0          90        2687

Most of the helped/hurt instruction changes are by one or two ops
because can no longer do quad-wide stores in all cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-30 17:18:41 -05:00
Ilia Mirkin 3ca941d60e nv50/ir: fix false global CSE on instructions with multiple defs
If an instruction has multiple defs, we have to do a lot more checks to
make sure that we can move it forward. Among other things, various code
likes to do

    a, b = tex()
    if () c = a
    else c = b

which means that a single phi node will have results pointing at the
same instruction. We obviously can't propagate the tex in this case, but
properly accounting for this situation is tricky. Just don't try for
instructions with multiple defs.

This fixes about 20 shaders in shader-db, including the dolphin efb2ram
shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-01-30 17:18:41 -05:00
Ilia Mirkin 3ca2001b53 nv50,nvc0: fix buffer clearing to respect engine alignment requirements
It appears that the nvidia render engine is quite picky when it comes to
linear surfaces. It doesn't like non-256-byte aligned offsets, and
apparently doesn't even do non-256-byte strides.

This makes arb_clear_buffer_object-unaligned pass on both nv50 and nvc0.

As a side-effect this also allows RGB32 clears to work via GPU data
upload instead of synchronizing the buffer to the CPU (nvc0 only).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> # tested on GF108, GT215
Tested-by: Nick Sarnie <commendsarnex@gmail.com> # GK208
Cc: mesa-stable@lists.freedesktop.org
2016-01-30 16:01:41 -05:00
Rob Clark f15447e7c9 freedreno/ir3: ignore clip-vertex varying
Since we emulate clip-planes, the clip-vertex is used within the VS
itself (thanks to nir_lower_clip).  So just ignore it as a VS output.
Fixes a boatload of piglit tests that were asserting on unknown
varying slot.

(Also unrelated spelling/typo fix.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:29:21 -05:00
Rob Clark f20cf22b54 freedreno/ir3: don't ignore local vars
With glsl_to_nir we end up with local variables, instead of global, for
arrays.

Note that we'll eventually have to do something more clever, I think,
when we support multiple functions, but that will probably take some
work in a few places.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:27:57 -05:00
Rob Clark 8039a2a6b3 freedreno/ir3: handle tex instrs w/ const offset
Something we start to see with glsl_to_nir.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:27:27 -05:00
Rob Clark f212d7dc50 freedreno/ir3: support load_front_face intrinsic
With tgsi_to_nir we get this as a normal input with VARYING_SLOT_FACE.
But glsl_to_nir plus nir_lower_system_values this becomes an intrinsic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:11:54 -05:00
Rob Clark 9e05e8cb75 freedreno: limit string marker to max packet size
Experimentally derived max size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:10:13 -05:00
Ilia Mirkin 438d421f8b nvc0: avoid crashing when there are holes in vertex array bindings
When using the "shared" vertex array configuration strategy, we bind
each of the buffers as a separate array. However there can be holes in
such vertex buffer lists, so just emit a disable for those.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-01-29 22:10:42 -05:00
Ilia Mirkin 899b1b98a4 nvc0: enable atomic counters and ssbo
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 22:10:42 -05:00
Ilia Mirkin 48cf392c0e nv50/ir: handle new TGSI MEMBAR opcode
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin df043f0764 nvc0/ir: fix atomic compare-and-swap arguments
Teach the emitter that the two registers are sequential, and drop the
second arg entirely, in favor of a double-wide first argument.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin 7b9a77b905 nv50/ir: add support for indirect buffer loading
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin 2c4eeb0b5c nv50/ir: add SUQ op by reading the info from driver constbuf
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:47 -05:00
Ilia Mirkin c3083c7082 nv50/ir: add support for BUFFER accesses
This largely leaves the existing image logic alone. When image support
is added this will have to be harmonized somehow.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:47 -05:00
Ilia Mirkin abe427ebd2 nvc0: handle shader buffer memory barrier
Issue a MEM_BARRIER. No idea if this is sufficient. As there are no
tests for this, it'll have to do for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:38 -05:00
Ilia Mirkin fe01be4ad5 nvc0: add state management for shader buffers
(address, length) pairs are uploaded to the driver constbuf as well to
make these values available to the shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:06:07 -05:00
Ilia Mirkin b4688c4615 nvc0: double per-shader stage driver constants area
We need to store a lot more info now with per-buffer address/size.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:06:06 -05:00
Ilia Mirkin ae725d5746 trace: add support for set_shader_buffers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add arg_begin/arg_end around buffer array
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin fea25db925 st/mesa: enable ARB_shader_storage_buffer_object when supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin 6fb8fac853 st/mesa: add shader buffer barrier bit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin 792bab24ac st/mesa: add support for memory barrier intrinsics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)

v1 -> v2: use TGSI_MEMBAR defines
2016-01-29 21:05:47 -05:00
Ilia Mirkin c0e1c54a4f st/mesa: use RESQ to find buffer size
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin 6880036694 st/mesa: add support for SSBO binding and GLSL intrinsics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

v1 -> v2: some 80 char reformatting
2016-01-29 21:05:46 -05:00
Ilia Mirkin 9d6f9ccf6b st/mesa: add atomic counter support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:46 -05:00
Ilia Mirkin 0fddb677e6 mesa: add PROGRAM_IMMEDIATE, PROGRAM_BUFFER
This makes PROGRAM_IMMEDIATE a first-class gl_register_file type, and
adds PROGRAM_BUFFER to the list. These are used purely inside
glsl_to_tgsi conversion.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:35 -05:00
Ilia Mirkin 35f8488668 glsl: keep track of ssbo variable being accessed, add access params
Currently any access params (coherent/volatile/restrict) are being lost
when lowering to the ssbo load/store intrinsics. Keep track of the
variable being used, and bake its access params in as the last arg of
the load/store intrinsics.

If the variable is accessed via an instance block, then 'variable'
points to the instance block variable and not the field inside the
instance block that we are accessing. In order to check access
parameters for the field itself we need to detect this case and keep
track of the corresponding field struct so we can extract the specific
field access information from there instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add tracking of struct field
v2 -> v3: minor adjustments based on Iago's feedback
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-29 21:05:08 -05:00
Ilia Mirkin 2b089c7ffe glsl: always initialize image_* fields, copy them on interface init
Interfaces can have image properties set in case they are buffer
interfaces. Make sure not to lose this information.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:04:56 -05:00
Ilia Mirkin 2ccc42fd2c tgsi: add MEMBAR opcode to handle memoryBarrier* GLSL intrinsics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add defines for the various bits
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-29 21:04:36 -05:00
Michel Dänzer 30fcf241e1 winsys/amdgpu: Process RADEON_FLAG_* independently from RADEON_DOMAIN_*
In particular, AMDGPU_GEM_CREATE_CPU_GTT_USWC can affect even BOs created
in VRAM if they get evicted to GTT. In general there's no need to
restrict any of the flags to any particular domains.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-29 16:06:06 +09:00
Michel Dänzer 62f837e2ea winsys/amdgpu: Handle RADEON_FLAG_NO_CPU_ACCESS
Failing to do this was resulting in the kernel driver unnecessarily
leaving open the possibility of CPU access to tiled BOs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93862

(This change shouldn't be backported to stable branches, because
released versions of xf86-video-amdgpu unnecessarily try to map the
front buffer)

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-29 16:06:06 +09:00
Karol Herbst 29d09f8747 nv50/ir: optimize mad/fma with third argument 0 to mul
Very modest effect, but it's clearly the right thing to do.

total instructions in shared programs : 6131491 -> 6131398 (-0.00%)
total gprs used in shared programs    : 910157 -> 910131 (-0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst      bytes
    helped           0          55          85          85
      hurt           0          26          20          20

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:59:41 -05:00
Karol Herbst 3aa681449e nv50/ir: run DCE backwards
Reduces calls up to 50%

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:34:29 -05:00
Karol Herbst 978ae28ca2 nv50/ir: optimize shl(shr(a, c), c) to and(a, ~((1 << c) - 1))
Following shader-db results on GK110:

total instructions in shared programs : 6141510 -> 6131491 (-0.16%)
total gprs used in shared programs    : 910187 -> 910157 (-0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst      bytes
    helped           0          18         821         821
      hurt           0           0           0           0

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:34:22 -05:00
Ilia Mirkin 089f605439 glsl: disallow implicit conversions in ESSL shaders
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-28 11:31:19 -05:00
Axel Davy dda7a84986 radeonsi: Add option for SI scheduler
Add a debug option to select the LLVM SI Machine Scheduler.
R600_DEBUG=sisched

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-28 17:22:44 +01:00
Samuel Iglesias Gonsálvez f9c43dd22f glsl: double-precision values don't support interpolation
ARB_gpu_shader_fp64 spec says:

  "This extension does not support interpolation of double-precision
  values; doubles used as fragment shader inputs must be qualified as
  "flat"."

Fixes the regressions added by commit 781d278:

arb_gpu_shader_fp64-double-gettransformfeedbackvarying
arb_gpu_shader_fp64-tf-interleaved
arb_gpu_shader_fp64-tf-interleaved-aligned
arb_gpu_shader_fp64-tf-separate

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93878
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-28 11:35:03 +01:00
Eric Anholt 3fba517bdd vc4: Throttle outstanding rendering after submission.
Just make sure that after we've submitted, we get to at least 5
(global) submits ago before we go on to do more.  Prevents up to
seconds of lag with window movement in X with xcompmgr -c.  There may
be useful tuning to do in the future, but for now this gets us
usability.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-01-27 20:05:37 -08:00
Eric Anholt 2a449ce7c9 vc4: Don't record the seqno of a failed job submit.
On an error return, the returned seqno will probably be unset, so we'd
lose track of what we've submitted so far for waiting on in the
future.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-01-27 20:05:37 -08:00
Ben Widawsky 0e06f76a84 i965/skl: Utilize new 5th bit for gateway messages
Modify comment as spotted by Matt, and Chris Forbes

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-27 17:12:56 -08:00