Commit Graph

83642 Commits

Author SHA1 Message Date
Samuel Pitoiset 3ac373df6e gm107/ir: add a legalize SSA pass for PFETCH
PFETCH, actually ISBERD on GM107+ ISA only accepts a GPR for src0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-27 23:18:58 +02:00
Samuel Pitoiset 653af07119 nvc0: fix up TCP header on GM107+
The number of outputs patch (limited to 255) has moved in the TCP
header, but blob seems to also set the old position. Also, the high
8-bits are now located inbetween the min/max parallel output read
address at position 20.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-27 23:18:41 +02:00
Mathias Fröhlich 2060f19b4f vbo: Fix handling of POS/GENERIC0 attributes.
In case of split primitives we need to restore
the original setting of the vtx.attrsz array to make
immediate mode attribute array tracking work.

v2: Use bool instead of boolean.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96950
2016-07-27 06:43:03 +02:00
Marek Olšák c98c732158 radeon/llvm: Use alloca instructions for larger arrays [revert a revert]
This reverts commit f84e9d749f.

Bioshock Infinite no longer hangs.
2016-07-26 23:31:56 +02:00
Marek Olšák 8636a718b5 r600g: add support for B5G6R5 PBO uploads via texture buffers (v2)
v2: set endian swap to 16

untested
2016-07-26 23:21:45 +02:00
Marek Olšák 1e5f00f9d5 radeonsi: pre-generate shader logs for ddebug
This cuts down the overhead of si_dump_shader when ddebug is capturing
shader logs, which is done for every draw call unconditionally (that's
quite a lot of work for a draw call).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák 18475aab6d radeonsi: add empty lines after shader stats
to separate individual shaders dumped consecutively.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák dd66f9d3e7 radeonsi: move the shader key dumping to si_shader_dump
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák b47727a83a ddebug: implement pipelined hang detection mode
For good performance while being able to generate decent hang reports.
The report doesn't contain the parsed IB and the buffer list, but it
isolates the draw call and dumps shaders while not having to flush
the context.

This is for GPU hangs that are harder to reproduce and require interactive
playing for minutes or even hours.

dd_pipe.h explains some implementation details. Initializing, copying
(recording) and clearing states is most of the code.

The performance should be at least 50% of the normal performance depending
on the circumstances. (i.e. 50% is expected to be the worst case scenario,
not the best case) The majority of time is spent in
dump_debug_state(PIPE_DUMP_CURRENT_SHADERS) and that's after all
the optimizations in later patches. There is no obvious way to optimize
that further.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák 0795a3d54f ddebug: don't save pointers to call parameters
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák e4079677a7 ddebug: move dd_call into dd_pipe.h
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák d50f9e9b04 ddebug: separate draw call dumping logic
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák 95c3025a41 ddebug: move all states into a separate structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák f7720948cc ddebug: write contents of dmesg into hang reports
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák 1f85f17998 ddebug: implement create_batch_query
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák 6b9924ccb6 ddebug: don't use abort()
We don't want a core dump.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák 26ef8158ac ddebug: make dd_get_file_stream accept the screen only
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák 27fa933a71 ddebug: clean up ddebug_screen_create
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák 6bf81de339 gallium: rework flags for pipe_context::dump_debug_state
The pipelined hang detection mode will not want to dump everything.
(and it's also time consuming) It will only dump shaders after a draw call
and then dump the status registers separately if a hang is detected.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Rob Herring 9ace2c1355 vc4: add hash table look-up for exported dmabufs
It is necessary to reuse existing BOs when dmabufs are imported. There
are 2 cases that need to be handled. dmabufs can be created/exported and
imported by the same process and can be imported multiple times.
Copying other drivers, add a hash table to track exported BOs so the
BOs get reused.

v2: Whitespace fixup (by anholt)

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-26 13:47:50 -07:00
Eric Anholt ce8504d196 vc4: Disable early Z with computed depth.
We don't tell the hardware whether we're computing depth, so we need
to manage early Z state manually.  Fixes piglit early-z.
2016-07-26 13:47:50 -07:00
Eric Anholt 4d0b2c7aaa ttn: Update shader->info as we generate code.
We could use the nir_shader_gather_info() pass to update it after the
fact, but this is what glsl_to_nir and prog_to_nir do.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
2016-07-26 13:47:50 -07:00
Vedran Miletić 7b9a0f4e38 mesa: standardize naming Mesa3D, MESA -> Mesa
Signed-off-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-26 13:28:01 -07:00
Kenneth Graunke 95c48391ee mesa: Make MESA_SHADER_CAPTURE_PATH skip shaders with Name == -1.
Shaders with shProg->Name == ~0 (aka 4294967295) are internal meta
shaders that we don't really want to capture.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-26 13:27:09 -07:00
Matt Turner 20553e4a2d mesa: Use AC_HEADER_MAJOR to include correct header for major().
Gentoo has been smoke testing an upcoming change to glibc.

Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=580392
2016-07-26 12:12:41 -07:00
Matt Turner 815135166c glsl: Remove references to tail_pred. 2016-07-26 12:12:27 -07:00
Matt Turner 5ed3299822 glx: Avoid aliasing violations.
Compilers are perfectly capable of generating efficient code for calls
like these to memcpy().

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner 2a1d2874f1 mesa: Avoid aliasing violation in uniform_query.cpp.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner f5ac1d366e mesa: Avoid aliasing violation in FXT1.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner a1e9b72102 swrast: Avoid aliasing violation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner 149309a424 glsl: Avoid aliasing violations.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner d1f6f65697 glsl: Separate overlapping sentinel nodes in exec_list.
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.

No difference in OglBatch7 (n=20).

Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Jason Ekstrand 5d76690f17 i965/miptree: Stop multiplying cube depth by 6 in HiZ calculations
intel_mipmap_tree::logical_depth0 is now in number of 2D slices so we no
longer need to be multiplying by 6.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-26 07:58:44 -07:00
Jason Ekstrand 833e389bc0 i965/miptree/isl: Stop multiplying depth by 6 for cubes
Now that the logical_depth0 field is in number of 2D slices, we don't need
to be multiplying by 6 when creating the surface.  It wasn't hurting
anything primarily because we get the actual length from the view which was
already handling it correctly.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-26 07:58:44 -07:00
Jason Ekstrand d16dc8e963 i965/blorp/gen8: Stop multiplying depth by 6 for cubes
intel_mipmap_tree::logical_depth0 is now in 2-D slices so there is no need
for us to multiply by 6 when we go to fill out a blorp surface state.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-26 07:58:44 -07:00
Samuel Pitoiset 126bd15940 nvc0: use nvc0_m2mf_push_linear() to reduce code duplication
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-26 00:50:34 +02:00
Samuel Pitoiset c5236f0ecc nvc0: use nve4_p2mf_push_linear() to reduce code duplication
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-26 00:40:37 +02:00
Andreas Boll 0420666ac0 build: Remove unused AX_CHECK_COMPILE_FLAG macro
Unused since 1a6ae84041

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-25 15:14:12 +02:00
Nils Wallménius a354c389f5 main: memcpy larger chunks in _mesa_propagate_uniforms_to_driver_storage
When possible, do the memcpy on larger blocks. This reduces cycles
spent in _mesa_propagate_uniforms_to_driver_storage from
1.51 % to 0.62% according to perf during the Unigine Heaven benchmark.
It did not affect the framerate of the benchmark. The system used for
testing was an i5 6600K with a Radeon R9 380.

Piglit hangs randomly on this system both with and without the patch
so i could not make a comparison.

v2: fixed whitespace

Signed-off-by: Nils Wallménius <nils.wallmenius@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-25 13:51:16 +02:00
Boyuan Zhang dd208ea006 st/va: enable h264 VAAPI encode
Enable H.264 VAAPI encoding through config. Currently only H.264 baseline is supported. Encode entrypoint is not accepted by driver.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:54 +02:00
Boyuan Zhang 71da1354d7 st/va: add function to handle misc param type frame rate
Frame rate can be passed to driver either through VAEncSequenceParameterBufferType or VAEncMiscParameterTypeFrameRate. Previous code only implement the former one, which is used by Gstreamer-Vaapi. Now adding implementation for VAEncMiscParameterTypeFrameRate. Also adding default frame rate as 30 just in case application never provides frame rate information to driver.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:53 +02:00
Boyuan Zhang 10dec2de2d st/va: add enviromental variable to disable interlace
Add environmental variable to disable interlace mode. At VAAPI decoding stage, driver can not distinguish b/w pure decoding case and transcoding case. And since interlace encoding is not supported, we have to disable interlace for transcoding case. The temporary solution is to use enviromental variable to disable interlace mode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:53 +02:00
Boyuan Zhang b0ceb4cc48 st/va: add preset values for VAAPI encode
Add some hardcoded values hardware needs mainly for rate control purpose. With previously hardcoded values for OMX, the rate control result is not correct. This change fixed the rate control result by setting correct values for Vaapi.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:52 +02:00
Boyuan Zhang 85d807f2e0 st/va: add functions for VAAPI encode
Add necessary functions/changes for VAAPI encoding to buffer and picture. These changes will allow driver to handle all Vaapi encode related operations. This patch doesn't change the Vaapi decode behaviour.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:52 +02:00
Boyuan Zhang 10c1cc47a6 st/va: get rate control method from configattrib v2
Rate control method is passed from app to driver through config attrib list.
That is why we need to store this rate control method to config. And later
on, we will pass this value to context->desc.h264enc.rate_ctrl.rate_ctrl_method.

v2 (chk): fix broken build and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:51 +02:00
Boyuan Zhang 34f4634843 st/va: add conversion for yv12 to nv12in putimage v2
For putimage call, if image format is yv12 (or IYUV with U V field swap) and
surface format is nv12, then we need to convert yv12 to nv12 and then copy
the converted data from image to surface. We can't use the existing logic
where surface is destroyed and re-created with yv12 format.

v2 (chk): fix some compiler warnings and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:51 +02:00
Boyuan Zhang 23b4ab1738 vl/util: add copy func for yv12image to nv12surface v2
Add function to copy from yv12 image to nv12 surface for VAAPI putimage call.
We need this function in VaPutImage call where copying from yv12 image to nv12
surface for encoding. Existing function can't be used because it only work for
copying from yv12 surface to nv12 image in Vaapi.

v2: cleanup variable types and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:18 +02:00
Boyuan Zhang 5bcaa1b9e9 st/va: add encode entrypoint v2
VAAPI passes PIPE_VIDEO_ENTRYPOINT_ENCODE as entry point for encoding case. We
will save this encode entry point in config. config_id was used as profile
previously. Now, config has both profile and entrypoint field, and config_id is
used to get the config object. Later on, we pass this entrypoint to
context->templat.entrypoint instead of always hardcoded to
PIPE_VIDEO_ENTRYPOINT_BITSTREAM for decoding case previously. Encode entrypoint
is not accepted by driver until we enable Vaapi encode in later patch.

v2 (chk): fix commit message to match 80 chars, use switch instead of ifs,
	  fix memory leaks in the error path, implement vlVaQueryConfigEntrypoints
	  as well, drop VAEntrypointEncPicture (only used for JPEG).

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:30:42 +02:00
Samuel Pitoiset e7b2ce5fd8 nvc0: upload sample locations on GM20x
This fixes a bunch of multisample piglit tests on GM206, like
bin/arb_texture_multisample-texelfetch 2 -auto -fbo

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-24 22:46:26 +02:00
Rob Clark 2f57e57881 freedreno/a4xx: time-elapsed query should be active for clears
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-24 09:33:05 -04:00