Commit Graph

77039 Commits

Author SHA1 Message Date
Nicolai Hähnle f6dc4f5558 st/mesa: set image access flags in st_bind_images
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:43 -05:00
Nicolai Hähnle 71a1b54b33 gallium: add access field to pipe_image_view
This allows drivers to make smarter decisions e.g. about whether the image
has to be decompressed.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:40 -05:00
Nicolai Hähnle 8c497b8fb5 st/glsl_to_tgsi: set FS_EARLY_DEPTH_STENCIL when required
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:37 -05:00
Nicolai Hähnle e526f930aa tgsi: add TGSI_PROPERTY_FS_EARLY_DEPTH_STENCIL
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:33 -05:00
Nicolai Hähnle 1c0cee8764 st/glsl_to_tgsi: set memory access type on image intrinsics
This is required to preserve the image variable's coherent/restrict/volatile
qualifiers in TGSI.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:30 -05:00
Nicolai Hähnle dfcf420412 st/glsl_to_tgsi: provide Texture and Format information for image ops
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:26 -05:00
Nicolai Hähnle 3243b6fc97 tgsi: add Texture and Format to tgsi_instruction_memory
Frontends should have this information readily available, and it simplifies
image LOAD/STORE/ATOM* handling especially with indirect image access.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:02 -05:00
Nicolai Hähnle 9b68bdf6f8 get: reconcile aliasing enums for MaxCombinedShaderOutputResources
The enums MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS and
MAX_COMBINED_SHADER_OUTPUT_RESOURCES are equal and should therefore only
appear once.

Noticed while implementing ARB_shader_image_load_store without previously
implementing SSBO.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-14 17:19:14 -05:00
Francisco Jerez b054605722 i965/fs: Restrict inequality that can only hold equal in saturate propagation.
Should have no functional change.  The IP value of an instruction that
reads src_var cannot possibly be after the end of the live interval of
the variable it's reading from, by the definition of live interval.
Might save future readers a momentary WTF while trying to understand
this code.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:58:19 -07:00
Francisco Jerez 7d7990cf65 i965/vec4: Consider removal of no-op MOVs as progress during register coalesce.
Bug found by the liveness analysis validation pass that will be
introduced in a later commit.  The no-op MOV check in
opt_register_coalesce() was removing instructions which makes the
cached liveness analysis calculation inconsistent with the shader IR.
We were failing to set progress to true in that case though, which
means that invalidate_live_intervals() wouldn't necessarily be called
at the end of the function.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:58:11 -07:00
Francisco Jerez 93be4158ae i965/fs: Add missing analysis invalidation in fixup_3src_null_dest().
Bug found by the liveness analysis validation pass that will be
introduced in a later commit.  fixup_3src_null_dest() was allocating
registers which makes the cached liveness analysis calculation
incomplete, so it must be invalidated.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:57:58 -07:00
Francisco Jerez 6691c03fd3 i965/fs: Add missing analysis invalidation in opt_sampler_eot().
Bug found by the liveness analysis validation pass that will be
introduced in a later commit.  opt_sampler_eot() was allocating
registers and inserting and removing instructions, which makes the
cached liveness analysis calculation inconsistent with the shader IR,
so it must be invalidated.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:56:02 -07:00
Hans de Goede 4d02e91e49 clover: Fix pipe_grid_info.indirect not being initialized.
After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.

This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.

Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
[ Francisco Jerez: Trivial codestyle fix. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-14 14:12:42 -07:00
Sarah Sharp af06190760 mesa: docs: Intel i965 hardware limits.
This should help the next person working on hardware enabling figure out
where in the Intel PRMs to find the magic platform hardware values.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2016-03-14 14:00:29 -07:00
Sarah Sharp 0f5bfc7f01 mesa: docs: i965: Use correct doxygen groupings syntax
When reading the source code, it's useful to indicate that a group of
fields in a struct are related in someway. There were several places
where people tried to group related structure members with the {@
syntax, without realizing they also needed to add the \name syntax in
order to generate correct doxygen html.

There are several files with groupings that look like this:

struct foo {
    /**
     * Related fields description
     * @{
     */
    int bar;
    char baz;
    /** @} */
    long qux;
}

However, the doxygen syntax for grouping is:

struct foo {
    /**
     * \name Related fields description
     * @{
     */
    int bar;
    char baz;
    /** @} */
    long qux;
}

https://www.stack.nl/~dimitri/doxygen/manual/grouping.html

Without the group name definition, the fields don't get properly
grouped. Instead, the group description is applied to the first field.

Fix the Intel hardware information structure, brw_device_info to
properly group the GPU hardware limitations and hardware quirks fields.

Once you've run `cd doxygen; make clean; make all`,
updated documentation can be found at

mesa/doxygen/i965/structbrw__device__info.html

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2016-03-14 14:00:29 -07:00
Bruce Cherniak e9d68cc3da gallium/swr: Resource management
Better tracking of resource state and synchronization.
A follow on commit will clean up resource functions into a new
swr_resource.cpp file.

Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2016-03-14 14:07:48 -05:00
Marek Olšák 7a2333e4ef configure.ac: require libdrm 2.4.66 for drmGetDevice
since 737b6ed13e
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c no longer compiles:
error: unknown type name ‘drmDevicePtr’
2016-03-14 16:42:41 +01:00
Francisco Jerez 63250d8178 i965: Remove useless IR self-destruct backend_shader method.
From the point it's constructed the CFG contains the only existing
copy of the program IR, and it never becomes invalid.  Calling
backend_shader::invalidate_cfg would have destroyed the program
structure irrecoverably -- We weren't calling it at all for a good
reason.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-13 18:07:53 -07:00
Pierre Moreau 8c7acd87af nv50,nvc0: Set only NEW_CP_GLOBALS upon binding
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-13 22:34:50 +01:00
Rob Clark e73ac84b93 freedreno/ir3: lower extract_byte/word
The following commits broke things by starting to feed us unhandled
extract_u16/extract_u8 opcodes:

commit 905ff86198
Author:     Matt Turner <mattst88@gmail.com>
AuthorDate: Wed Feb 3 14:28:31 2016 -0800
Commit:     Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800

    nir: Recognize open-coded extract_u16.

commit 76289fbfa8
Author:     Matt Turner <mattst88@gmail.com>
AuthorDate: Thu Jan 21 09:09:48 2016 -0800
Commit:     Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800

    nir: Recognize open-coded extract_u8.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 14:10:57 -04:00
Ilia Mirkin c1e4a6bfbf nv50,nvc0: handle SQRT lowering inside the driver
First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.

Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).

This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-13 13:17:24 -04:00
Ilia Mirkin b3e7fb5234 nv50/ir: avoid folding mul + add if the mul has a dnz
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-13 13:17:24 -04:00
Ilia Mirkin a651bc027d nvc0: fix blit triangle size to fully cover FB's > 8192x8192
The idea is that a single triangle will cover the whole area being
drawn, allowing the blit shader to do its work. However the max fb size
is 16384x16384, which means that the triangle we draw needs to be twice
that in order to cover the whole area fully. Increase the size of the
triangle to 32768x32768.

This fixes a number of dEQP tests that were failing because a blit was
involved which would miss some of the resulting texture.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-13 13:17:24 -04:00
Rob Clark 01b071d530 freedreno: OUT_RELOC vs OUT_RELOCW fixes
Make sure we use OUT_RELOCW() in cases where the buffer is written to.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark f68c6951b8 freedreno/a4xx: hw binning
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark b3fe196e21 freedreno/a4xx: use generated headers for draw initiator
No need to open-code this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark 2224ba5976 freedreno/a4xx: remove RB_RENDER_CONTROL patching
Bitfields where shuffled around for the better on a4xx, so we don't need
any patching on this one.  It appears to be something we set entirely in
the gmem code so no conflict between tiling and render state like we had
in a3xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark 8824a765a2 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark 476551a21f freedreno/a3xx: move where we deal w/ binning FS
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark dd9135c452 freedreno/a4xx: move where we deal w/ binning FS
Move where we pick dummy FS for binning pass, so the whole driver sees
the same dummy/no-op FS stage.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark 09b3447344 freedreno/a3xx: constify the shader variants
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark 5b955f09f7 freedreno/a4xx: constify the shader variants
Most of the driver just needs read-only access, so constify..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:40 -04:00
Rob Clark d9395e4ed8 freedreno/a3xx: remove duplicate mark of end of binning cmds
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:40 -04:00
Nicolai Hähnle 28d2a7e67c radeonsi: avoid crash when a sampler state is bound for a buffer texture
Sampler states don't really make sense with buffer textures, but they
can be set anyway, so we need to be defensive here. This bug was lurking
for a while and was finally noticed due to PBO uploads setting sampler
states.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94284
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Tested-by: Shawn Starr <shawn.starr@rogers.com>
2016-03-13 09:37:23 -05:00
Matt Turner 61b10b4eb7 i965: Use foreach_in_list_reverse_safe() macro.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-12 19:23:50 -08:00
Jason Ekstrand 98d58e7320 nir/clone: Add support for cloning a single function_impl
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 036b209484 nir/validate: Better function validation
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand f86f3c90aa nir/print: Better function argument printing
Since we aren't going to put the function parameters or the return variable
in the list of locals, it won't get a proper declaration.  This changes
nir_print to print the type along with each parameter or return variable.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 13969565f9 nir/print: Factor variable name lookup into a helper
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand e4bebe8a02 nir: Create function parameters in function_impl_create
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 066d3c115e nir: Add a helper for creating a "bare" nir_function_impl
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 2ef4754a20 nir: Add a new "param" variable mode for parameters and return variables
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 41ae553fda nir/glsl: Remove dead function parameter handling code
NIR has never been used on IR where we haven't already done function
inlining so this code has been dead from the beginning.  Let's just get rid
of it for now.  We can always put it back in if we decide to use NIR for
function inlining at some point in the future.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Boyuan Zhang 6cf120ec77 st/va: add HEVC main 10 profile
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Boyuan Zhang 06c862d67d radeon/video: enable HEVC main 10 decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Boyuan Zhang 8be9efcce7 radeon/uvd: handle HEVC main 10 decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Ben Widawsky d1ab544bb8 i965/chv: Display proper branding
"Braswell" is a Cherryview based *thing*. It unfortunately requires extra
information to determine its marketing name. Unlike all previous products, and
hopefully all future ones, there is no unique 1:1 mapping of PCI device ID to
brand string.

I put up a fight about adding any complexity to our GL renderer string code for
a very long time. However, a wise man made a comment to me that I couldn't argue
with: if a user installs Windows on their hardware, the brand string should be
the same as what we display in Linux. The Windows driver apparently does this
check, so we should too.

Note that I did manage to find a good use for this info anyway in the compute
shader thread counts.

v2: memcpy instead of strncpy, and some minor changes (Matt)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
2016-03-11 11:17:28 -08:00
Ben Widawsky 5e6a43a001 i965/chv: Update lower min for CS threads
We have better information now, and 28 was not a valid thing to support. 6 EUs
per sublice with 7 threads per EU is the minimum supported config.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
2016-03-11 11:17:28 -08:00
Ben Widawsky 3dc3dbc8d8 i965/chv: Check that compute threads are above threshold
The way we are organizing this code, the statically configured max_cs_threads
should always be the minimum value we actually support (ie. are aware of). As a
result, we can fall back to that if we get invalid numbers from the kernel (ie.
when the query succeeds, but the result is lower than expected).

I was originally planning to use an assert, but there is no reason to be so
mean.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
2016-03-11 11:17:28 -08:00
Ben Widawsky 9dd20b715a i965/chv: Use kernel provided info for max_cs_threads
With the previous patches, the code can find out the actual number of available
compute threads. It is enabled only for Cherryview since that is the only
platform I know for a fact has shipped devices which can benefit from this.  It
seems like other platforms /might/ benefit from this because of fused
configurations which /might/ have shipped. Fallback code is still there.

v2: Some minor adjustments from Matt

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
2016-03-11 11:17:28 -08:00