Commit Graph

75443 Commits

Author SHA1 Message Date
Julien Isorce 777d1453f1 build: enable st/va with nouveau driver
vainfo fails in vaDriverInit because "dd_create_screen"
does not reach strcmp(driver_name, "nouveau") code.
Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU
is not defined.
This patch define the macro the same it is done for dri and
vdpau targets.

Tested with:
./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va
--with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11

LIBVA_DRIVER_NAME=gallium vainfo
Output:
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple                  :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileMPEG4Simple            :	VAEntrypointVLD
      VAProfileMPEG4AdvancedSimple    :	VAEntrypointVLD
      VAProfileVC1Simple              :	VAEntrypointVLD
      VAProfileVC1Main                :	VAEntrypointVLD
      VAProfileVC1Advanced            :	VAEntrypointVLD
      VAProfileH264Baseline           :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileNone                   :	VAEntrypointVideoProc

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce abb30b9c8b nvc0: add support for st/va
- split nvc0_decoder_bsp in begin/next/end
- preserve content buffer when calling nvc0_decoder_bsp_next
- implement pipe_video_codec::begin_frame/end_frame

https://bugs.freedesktop.org/show_bug.cgi?id=89969

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce 7ba27f60f7 nouveau: split nouveau_vp3_bsp in begin/next/end
It allows to call nouveau_vp3_bsp_next multiple times
between one begin/end.

It is required to support st/va.

https://bugs.freedesktop.org/show_bug.cgi?id=89969

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
[imirkin: create strparm_bsp function, simplified w0 calculation]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce 851e7e12aa st/va: count number of slices
The counter was not set but used by the nouveau driver.
It is required otherwise visual output is garbage.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
2016-01-05 15:02:47 +00:00
Ilia Mirkin 14f21f53d5 i965/wm: use binding size for ubo/ssbo when automatic size is unset
This fixes the same tests that commit 8cf2e892f was attempting to fix:

ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize

as confirmed by Samuel.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-05 01:30:09 -05:00
Ilia Mirkin a1d664a0b7 Revert "i965/wm: use proper API buffer size for the surfaces."
This reverts commit 8cf2e892fc. It's
entirely bogus to attempt to store anything about the binding in the
buffer object itself, which might be bound any number of times.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-05 01:29:49 -05:00
Nicolai Hähnle 2123bfcc9c st/mesa: make KHR_debug output independent of context creation flags (v2)
Instead, keep track of GL_DEBUG_OUTPUT and (un)install the pipe_debug_callback
accordingly. Hardware drivers can still use the absence of the callback to
skip more expensive operations in the normal case, and users can no longer be
surprised by the need to set the debug flag at context creation time.

v2:
- re-add the proper initialization of debug contexts (Ilia Mirkin)
- silence a potential warning (Ilia Mirkin)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-04 18:40:49 -05:00
Ilia Mirkin b16c9be4a5 nvc0: scale up inter_bo size so that it's 16M for a 4K video
Experimentally, 4M causes corruption and slowness, try to ramp it up
with size instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-04 11:32:45 -05:00
Ilia Mirkin b5f2f7073f nv50,nvc0: fix crash when increasing bsp bo size for h264
H264 doesn't have a bitplane bo. We just need a device reference, so use
the one from the client.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-04 11:32:45 -05:00
Samuel Iglesias Gonsálvez 8cf2e892fc i965/wm: use proper API buffer size for the surfaces.
Commit 5bb5eeea fixes a bug indicating that the surfaces should have the
API buffer size. Hovewer it picked the wrong value.

This patch adds a new variable, which takes into account
glBindBufferRange() values. This patch fixes the following CTS
regressions:

ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-04 07:52:24 +01:00
Marek Olšák 86fa48426c radeonsi: remove unused parameter from si_shader_binary_read_config
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák b6d95248f0 radeonsi: move si_shader_binary_upload out of si_shader_binary_read
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák 7fa6bb47e3 gallium/radeon: dump LLVM module outside of radeon_llvm_compile
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák fb98acb5a1 gallium/radeon: always add +DumpCode to the LLVM target machine for LLVM <= 3.5
It's the same behavior that we use for later LLVM.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák cd7f252b11 gallium/radeon: r600_can_dump_shader should get TGSI processor type directly
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák fd7000bd78 radeonsi: pass TGSI processor type to si_shader_binary_read for dumping
the parameter will be used later

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák 3ce0a2fd7f radeonsi: pass TGSI processor type to si_compile_llvm for dumping
the parameter will be used later

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák dd79034ca6 radeonsi: rename shader parameter definitions and variables for more clarity
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Ilia Mirkin 34217018c4 nvc0/ir: add support for PK2H/UP2H
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-03 16:20:52 -05:00
Ilia Mirkin 20dee333f3 st/mesa: use PK2H/UP2H when supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-03 16:20:47 -05:00
Ilia Mirkin e9f43d6333 gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-03 16:20:41 -05:00
Ilia Mirkin 459e4532af tgsi: update PK2H/UP2H channel behavior info
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-03 16:20:27 -05:00
Ilia Mirkin 6eb74b87b8 gallium: document PK2H/UP2H
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-03 16:19:57 -05:00
Samuel Pitoiset 0ab2c21b93 st/mesa: fix parameter names for tesseval/tessctrl prototypes
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-03 22:01:18 +01:00
Ilia Mirkin bf34748b39 nouveau: fix double-const qualifier
Reported by Tom^ on IRC. The original intent was to mark the pointer
constant as well as the data being pointed to, so move the *.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-03 11:32:15 -05:00
Rob Clark 3684e899ea freedreno/ir3: use NIR_PASS helper macros
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-03 09:11:27 -05:00
Rob Clark 317628dbb3 nir: extract out helper macros for running passes
Note these are a bit uglier, due to avoidance of GNU C extensions.  But
drivers which do not need to be built with compilers that don't support
the extension can wrap these macros with their own.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-01-03 09:11:27 -05:00
Rob Clark 23bd6affb2 freedreno/ir3: we require block_index metadata
Found during NIR_TEST_CLONE=1 piglit run.  We were using block->index
but forgetting to require it.  Causing things to not work with a cloned
shader which didn't preserve block_index.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-03 09:11:27 -05:00
Rob Clark 74135f804a freedreno/ir3: refactor NIR IR handling
Immediately convert into NIR and do an initial key-agnostic lowering/
optimization pass.  This should let us share most of the per-variant
transformations between each variant, and hopefully minimize the draw-
time variant creation part of the compilation process.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-03 09:11:27 -05:00
Rob Clark ab4efb19dc freedreno/ir3: drop unnecessary unreachable() case
It will still hit a compile_assert() in emit_tex, which has the
advantage of dumping out the offending shader.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-03 09:11:27 -05:00
Samuel Pitoiset 6a49fcfb1f gallium/tests: fix build with clang compiler
Nested functions are supported as an extension in GNU C, but Clang
don't support them.

This fixes compilation errors when (manually) building compute.c,
or by setting --enable-gallium-tests to the configure script.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-03 12:18:00 +01:00
Samuel Pitoiset 53dddab78c nv50,nvc0: optimize coherent buffer checking at draw time
Instead of iterating over all the buffer resources looking for coherent
buffers, we keep track of a context-wide count. This will save some
iterations (and CPU cycles) in 99.99% case because usually coherent
buffers are not so used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-03 12:17:05 +01:00
Kenneth Graunke 28dea26626 i965: Make TCS precompile use the TES primitive mode when available.
If there's a linked TES program, we should just use the actual
primitive mode.  If not, just guess triangles (as we did before).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-02 18:46:16 -08:00
Kenneth Graunke 4a1c8a3037 i965: Push most TES inputs in SIMD8 mode.
Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache.  Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.

However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs.  An arbitrary cut-off of
32 vec4 slots (16 registers) is more than sufficient to ensure that
100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
GPUTest/TessMark, and SynMark.

Note that unlike most SIMD8 stages, this actually reads packed vec4
data, since that is what our vec4 TCS programs write.

Improves performance in GPUTest's tessmark_x64 microbenchmark
by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.

Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
by 22.74% +/- 0.309394% (n = 5).

Improves performance in Shadow of Mordor at low settings with
tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).

shader-db statistics for files containing tessellation shaders:

total instructions in shared programs: 184358 -> 181181 (-1.72%)
instructions in affected programs: 27971 -> 24794 (-11.36%)
helped: 226

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Kenneth Graunke b022150d70 i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.
We need a MOV to replicate g0.0<0,1,0> to all 8 channels.  Since the
message payload is a single register, MOV seemed more sensible than
LOAD_PAYLOAD.  However, MOV cannot be CSE'd, while LOAD_PAYLOAD can.

All input loads can use the same header - we don't need to re-expand
g0 every time.  CSE accomplishes this, saving instructions.

shader-db statistics for files containing tessellation shaders:

total instructions in shared programs: 186923 -> 184358 (-1.37%)
instructions in affected programs: 30536 -> 27971 (-8.40%)
helped: 226
HURT: 0

total cycles in shared programs: 1009850 -> 1005356 (-0.45%)
cycles in affected programs: 168206 -> 163712 (-2.67%)
helped: 226
HURT: 0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Kenneth Graunke 53a9b6223f i965: Move 3-src subnr swizzle handling into the vec4 backend.
While most align16 instructions only support a SubRegNum of 0 or 4
(using swizzling to control the other channels), 3-src instructions
actually support arbitrary SubRegNums.  When the RepCtrl bit is set,
we believe it ignores the swizzle and uses the equivalent of a <0,1,0>
region from the subnr.

In the past, we adopted a vec4-centric approach of specifying subnr of
0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper
SubRegNum.  This isn't a great fit for the scalar backend, where we
don't set swizzles at all, and happily set subnrs in the range [0, 7].

This patch changes brw_eu_emit.c to use subnr and swizzle directly,
relying on the higher levels to set them sensibly.

This should fix problems where scalar sources get copy propagated into
3-src instructions in the FS backend.  I've only observed this with
TES push model inputs, but I suppose it could happen in other cases.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Eric Anholt 64253fdb2e vc4: Fix build from upload changes. 2016-01-02 17:33:19 -08:00
Nicolai Hähnle 8f384d07a8 gallium/radeon: send LLVM diagnostics as debug messages
Diagnostics sent during code generation and the every error message reported
by LLVMTargetMachineEmitToMemoryBuffer are disjoint reporting mechanisms. We
take care of both and also send an explicit message indicating failure at the
end, so that log parsers can more easily tell the boundary between shader
compiles.

Removed an fprintf that could never be triggered.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle 255ccd1e99 gallium/radeon: pass pipe_debug_callback into radeon_llvm_compile (v2)
This will allow us to send shader debug info via the context's debug callback.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle f8cd11403a radeonsi: send shader info as debug messages in addition to stderr output
The output via stderr is very helpful for ad-hoc debugging tasks, so that remains
unchanged, but having the information available via debug messages as well
will allow the use of parallel shader-db runs.

Shader stats are always provided (if the context is a debug context, that is),
but you still have to enable the appropriate R600_DEBUG flags to get
disassembly (since it is rather spammy and is only generated by LLVM when we
explicitly ask for it).

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle 4bb1c8dfec radeonsi: pass pipe_debug_callback down into si_shader_binary_read (v2)
This will allow us to send shader debug info.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:23 -05:00
Nicolai Hähnle b6847062dd gallium/radeon: implement set_debug_callback
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:23 -05:00
Marek Olšák ecb2da1559 u_upload_mgr: allow specifying PIPE_USAGE_* for the upload buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:45 +01:00
Marek Olšák 37d0aea772 u_upload_mgr: remove alignment parameter from u_upload_create
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:45 +01:00
Marek Olšák 1bb79c3a7b u_upload_mgr: pass alignment to u_upload_buffer manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák e0f932846c u_upload_mgr: pass alignment to u_upload_data manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák 020009f7cc u_upload_mgr: pass alignment to u_upload_alloc manually
The fixed alignment of u_upload_mgr will go away.
This is the first step.

The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák ffc4716e97 u_upload_mgr: rework the application of alignment
The function only aligned the size, but not the offset.
The offset was aligned only when the previous suballocation was aligned.
That yielded the correct offset alignment if the alignment was constant
for all suballocations.

Instead, directly align the offset, but allow an unaligned size.
There is no change in behavior, because the alignment is constant
at the moment.

This a prerequisite for allowing a variable alignment for suballocations.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák 36c93a6fae st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2)
Spotted by luck. The GLSL uniform storage is only associated once
in LinkShader and can't be reallocated afterwards, because that would
break the association.

v2: don't remove st_upload_constants calls, clarify why they're needed

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
2016-01-02 15:15:44 +01:00
Marek Olšák 294ed5cd13 program: add _mesa_reserve_parameter_storage
The next commit will use this.

Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
2016-01-02 15:15:44 +01:00