Commit Graph

89520 Commits

Author SHA1 Message Date
Jason Ekstrand b3135c3cf3 anv: Advertise shaderInt64 on Broadwell and above
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-03 13:59:29 -08:00
Jason Ekstrand bc456749bd nir/int64: Properly handle imod/irem
The previous implementation was fine for GLSL which doesn't really have
a signed modulus/remainder.  They just leave the behavior undefined
whenever either source is negative.  However, in SPIR-V, there is a
defined behavior for negative arguments.  This commit beefs up the pass
so that it handles both correctly.  Tested using a hacked up version of
the Vulkan CTS test to get 64-bit support.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-03 13:59:27 -08:00
Jason Ekstrand 9745bef308 nir/builder: Add an int64 immediate helper
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-03 13:59:24 -08:00
Kenneth Graunke 46cd549c2b genxml: Fill out Gen4 and G45 XML.
This is a work in progress - some things may still need fixing.
But it should be in pretty decent shape.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-03 10:23:17 -08:00
Marek Olšák 7f1446a8a1 ac: normalize build helper names
s/emit/build/

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 17:30:07 +01:00
Marek Olšák 8bde7fb3fc ac: replace SI.vs.load.input with amdgcn.buffer.load.format
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 17:30:07 +01:00
Marek Olšák 94811dc66c radeonsi: move SI.vs.load.input building into amd/common
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 17:30:07 +01:00
Marek Olšák 52660484c1 radeonsi: detect and mark loads/stores from read-only/write-only memory 2017-03-03 17:29:56 +01:00
Marek Olšák 97e21cfa25 ac: replace llvm.SI.tbuffer.store with llvm.amdgcn.buffer.store if ADD_TID=0
ADD_TID doesn't work. Needs more investigation.

v2: remove leftover dead code

Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2017-03-03 15:29:30 +01:00
Marek Olšák 684339827c radeonsi: use the writeonly LLVM attribute 2017-03-03 15:29:30 +01:00
Marek Olšák 8cfdbba6c7 ac: remove offen parameter from ac_build_buffer_store_dword
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák 1bc88c02c0 radeonsi: enable TC L2 for tessellation offchip stores
Vulkan does the same thing.
2017-03-03 15:29:30 +01:00
Marek Olšák 27439dfdae radeonsi: merge and simplify tbuffer_store functions
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák b46e412c2e radeonsi: set noalias on input shader pointers 2017-03-03 15:29:30 +01:00
Marek Olšák d4324ddb89 radeonsi: replace AMDGPU.bfe.* with amdgcn.*bfe
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák 9c09592086 radeonsi: move kill intrinsic building into amd/common
just a cleanup

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák e729dc7c46 radeonsi: set readnone on reads from read-only memory 2017-03-03 15:29:30 +01:00
Marek Olšák 25c7969a5a radeonsi: replace SI.buffer.load.dword with amdgcn.buffer.load 2017-03-03 15:29:30 +01:00
Marek Olšák 653ac0b389 radeonsi: replace SI.packf16 with amdgcn.cvt.pkrtz 2017-03-03 15:29:30 +01:00
Marek Olšák 4b2e5b9389 ac: replace old image intrinsics with new ones
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák c6a3911e5d radeonsi: remove last use of llvm.SI.resinfo
and move one function up to reuse the code.
2017-03-03 15:29:30 +01:00
Marek Olšák ad18d7f040 radeonsi: move image intrinsic building to amd/common
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák 2b3ebe307c ac: replace SI.export with amdgcn.exp.*
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák 369f4a8726 radeonsi: move llvm.SI.export building to amd/common
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák 9af03318aa ac: unify build_type_name_for_intr functions
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák f8c823b103 radeonsi: set unorm=1 for TGSI_TEXTURE_SHADOWRECT as well
It was harmless, because we also set unorm in the sampler state.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák b5744310d4 gallivm, ac: add writeonly and inaccessiblememonly attributes
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák 455c79b24f tgsi/scan: record load/store/atomic image usage
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Eric Anholt 3958c01762 glapi: Fix a comment typo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-03 20:29:12 +11:00
Alejandro Piñeiro a54f0ad6d3 mesa/main: *TextureSubImage* generates INVALID_OPERATION on wrong target
Equivalent *TexSubImage* methods generates INVALID_ENUM.

From OpenGL 4.5 spec, section 8.6 Alternate Texture Image
Specification Commands:

   "An INVALID_ENUM error is generated by *TexSubImage* if target does
    not match the command, as shown in table 8.15."

And:

   "An INVALID_OPERATION error is generated by *TextureSubImage* if
    the effective target of texture does not match the command, as
    shown in table 8.15."

Fixes:
GL45-CTS.direct_state_access.textures_copy_errors

v2: slightly change commit summary (Samuel)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-03 08:14:53 +01:00
Ben Widawsky d844d8e4d5 i965: Add Kaby Lake brandstrings
While here, use the spacing defined in Ark.
https://ark.intel.com/products/codename/82879/Kaby-Lake

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-03-02 21:00:02 -08:00
Grazvydas Ignotas 4dc42ae792 tgsi/ureg: return correct token count in ureg_get_tokens
Valgrind reports that the shader cache writes uninitialized data to disk.
Turns out ureg_get_tokens() is returning the count of allocated tokens
instead of how many are actually used, so the cache writes out unused
space at the end. Use the real count instead.

This change should not cause regressions elsewhere because the only
ureg_get_tokens() user that cares about token count is the shader cache.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-03 12:11:55 +11:00
Timothy Arceri 6084855528 radeonsi: add support for an on-disk shader cache
V2:
- when loading from disk cache also binary insert into memory cache.
- check that the binary loaded from disk is the correct size. If not
  delete the cache item and skip loading from cache.

V3:
- remove unrequired variable

Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-03 12:09:08 +11:00
Timothy Arceri 85a9b1b562 util/disk_cache: compress individual cache entries
This reduces the cache size for Deus Ex from ~160M to ~30M for
radeonsi (these numbers differ from Grigori's results below
probably due to different graphics quality settings).

I'm also seeing the following improvements in minimum fps in the
Shadow of Mordor benchmark on an i5-6400 CPU@2.70GHz, with a HDD:

no-cache:                    ~10fps
with-cache-no-compression:   ~15fps
with-cache-and-compression:  ~20fps

Note: The with cache results are from the second run after closing
and opening the game to avoid the in-memory cache.

Since we mainly care about decompression I went with
Z_BEST_COMPRESSION as suggested on irc by Steinar H. Gunderson
who has benchmarked decompression speeds.

Grigori Goronzy provided the following stats for Deus Ex: Mankind
Divided start-up times on a Athlon X4 860k with a SSD:

No Cache                                 215 sec

Cold Cache zlib BEST_COMPRESSION         285 sec
Warm Cache zlib BEST_COMPRESSION         33 sec

Cold Cache zlib BEST_SPEED               264 sec
Warm Cache zlib BEST_SPEED               33 sec

Cold Cache no compression                266 sec
Warm Cache no compression                34 sec

The total cache size for that game is 48 MiB with BEST_COMPRESSION,
56 MiB with BEST_SPEED and 170 MiB with no compression.

These numbers suggest that it may be ok to go with Z_BEST_SPEED
but we should gather some actual decompression times before doing
so. Other options might be to do the compression in a separate
thread, this might allow us to use a higher compression algorithim
such as LZMA.

Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-03-03 12:09:08 +11:00
Timothy Arceri 5afde61752 util/disk_cache: add support for detecting corrupt cache entries
V2: fix pointer increments for writing/reading crc

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
2017-03-03 12:09:08 +11:00
Samuel Pitoiset 9fc86d4f53 glsl: fix subroutine mismatch between declarations/definitions
Previously, when q.subroutine was set to 1, a new subroutine
declaration was added to the AST, while 0 meant a subroutine
definition has been detected by the parser.

Thus, setting the q.subroutine flag in both situations is
obviously wrong because a new type identifier is added instead
of trying to match the declaration. To fix it up, introduce
ast_type_qualifier::is_subroutine_decl() to differentiate
declarations and definitions easily.

This fixes a regression with:
arb_shader_subroutine/compiler/direct-call.vert

Cc: Mark Janes <mark.a.janes@intel.com>
Fixes: be8aa76afd ("glsl: remove unecessary flags.q.subroutine_def")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100026
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-03 00:57:57 +01:00
Matt Turner 10f2c86aa3 genxml: Depend on Makefile.am for generated sources.
Depending on the generated Makefile means that all generated sources are
recreated after ./configure.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-02 15:49:00 -08:00
Matt Turner 7d1195c1e4 clover: Work around build failure with AltiVec.
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=587210
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68504
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-03-02 15:49:00 -08:00
Nanley Chery d7d64f1091 anv/image: Allow HiZ on input attachment-capable depth/stencil images
While an input attachment may only take on one of those two layouts,
other depth/stencil attachments that use the same image may have
HiZ-enabled layouts. Improves the average frame rate on a release
candidate of a proprietary Vulkan benchmark by 9.94% over 3 runs on my
SKL GT4.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery 76b8cc2a1c anv/cmd_buffer: Centralize automatic layout transitions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery 0a72b5f3cb anv/cmd_buffer: Add attachment transitioning functions
This is needed to transition input attachments.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery 9950774f8b anv/blorp: Encapsulate subpass id querying
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery c78a959bcf anv/cmd_buffer: Enable render pass awareness
v2: Update cmd_state_reset (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery c0223d052b anv/pass: Store subpass attachment reference list
We'll loop through this array when performing automatic layout
transitions.

v2: Adjust formatting of an assignment (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery 8f6a17c8e7 anv/pass: Fix size of anv_render_pass:subpass_attachments
Don't allocate space for resolve attachments if the subpass has none.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery 608d17b80e anv: Store the user's VkAttachmentReference
We will be using the image layout. Store the full struct directly from
the user.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery 6326f0f4be anv/cmd_buffer: Remove extra resolve for certain depth buffers
Due to recent commits, the sampler now bypasses the auxiliary HiZ buffer
when reading from a depth image subresource that is in the general
layout. Remove this unneeded resolve.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery ea744912b3 anv/cmd_buffer: Conditionally choose the sampled image surface state
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery 5408d3fd05 anv/descriptor_set: Store aux usage of sampled image descriptors
v2: Rebase onto latest changes
v3: Account for NULL image_view in aux_usage assignment

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery efc2222323 anv/image: Create an additional surface state for sampling
This will be used to sample a depth input attachment without having to
pass through the HiZ buffer.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00