Commit Graph

104655 Commits

Author SHA1 Message Date
Timothy Arceri 5db981952a nir: add loop unroll support for wrapper loops
This adds support for unrolling the classic

    do {
        // ...
    } while (false)

that is used to wrap multi-line macros. GLSL IR also wraps switch
statements in a loop like this.

shader-db results IVB:

total loops in shared programs: 2515 -> 2512 (-0.12%)
loops in affected programs: 33 -> 30 (-9.09%)
helped: 3
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-29 16:02:05 +10:00
Timothy Arceri 0f450b57a1 nir/opt_loop_unroll: Remove unneeded phis if we make progress
Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis.  In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This avoids validation issues with some CTS tests with the following
patch, but its possible this we could also see the same problem with
the existing unrolling passes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-29 16:02:05 +10:00
Timothy Arceri 5a6b04d94b nir: add complex_loop bool to loop info
In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.

This will be used in the following patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-29 16:02:05 +10:00
Timothy Arceri fef6325e58 nir: always attempt to find loop terminators
This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-29 16:02:05 +10:00
Marek Olšák 1e40f69483 ac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI
This fixes VM faults and corruption.

Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-08-28 19:51:51 -04:00
Ian Romanick c836326a29 i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
No shader-db changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:50 -07:00
Ian Romanick c856403868 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1
v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

No changes on any other Intel platforms.

Skylake
total instructions in shared programs: 14304261 -> 14304241 (<.01%)
instructions in affected programs: 1625 -> 1605 (-1.23%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -15.91% 4.19%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527531226 -> 527531194 (<.01%)
cycles in affected programs: 92204 -> 92172 (-0.03%)
helped: 2
HURT: 0

Haswell and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 14615730 -> 14615710 (<.01%)
instructions in affected programs: 1838 -> 1818 (-1.09%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -14.59% 3.85%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:46 -07:00
Ian Romanick b6e247cf0e i965/fs: Refactor image atomics to be a bit more like other atomics
This greatly simplifies the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:46 -07:00
Ian Romanick fabe3ead57 i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Funny story... a single shader was hurt for instructions, spills, fills.
That same shader was also the most helped for cycles.  #GPUsAreWeird

No changes on any other Intel platform.

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14304116 -> 14304261 (<.01%)
instructions in affected programs: 12776 -> 12921 (1.13%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
95% mean confidence interval for instructions value: -12.83 27.33
95% mean confidence interval for instructions %-change: -1.57% 0.31%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527552861 -> 527531226 (<.01%)
cycles in affected programs: 1459195 -> 1437560 (-1.48%)
helped: 16
HURT: 2
helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -3699.81 1295.92
95% mean confidence interval for cycles %-change: -0.94% 0.30%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8025 -> 8033 (0.10%)
spills in affected programs: 208 -> 216 (3.85%)
helped: 1
HURT: 1

total fills in shared programs: 10989 -> 11040 (0.46%)
fills in affected programs: 444 -> 495 (11.49%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11709181 -> 11709153 (<.01%)
instructions in affected programs: 3505 -> 3477 (-0.80%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%

total cycles in shared programs: 254741126 -> 254738801 (<.01%)
cycles in affected programs: 919067 -> 916742 (-0.25%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%

total spills in shared programs: 4536 -> 4533 (-0.07%)
spills in affected programs: 40 -> 37 (-7.50%)
helped: 1
HURT: 0

total fills in shared programs: 4819 -> 4813 (-0.12%)
fills in affected programs: 94 -> 88 (-6.38%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 15:35:38 -07:00
Ian Romanick 41399f4bc7 intel/compiler: Silence unused parameter warnings in brw_eu.h
All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it.  Keep the parameter,
but mark it as unused.

Silences 37 warnings like:

In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_pixel_interp_desc(const struct gen_device_info *devinfo,
                                                     ^~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-28 15:35:38 -07:00
Sagar Ghuge 56574f4df3 i965: enable AMD_depth_clamp_separate
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-08-28 12:57:27 -07:00
Sagar Ghuge e6adea0dc0 i965: add functional changes for AMD_depth_clamp_separate
Gen >= 9 have ability to control clamping of depth values separately at
near and far plane.

z_w is clamped to the range [min(n,f), 0] if clamping at near plane is
enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f)
max(n,f)] if clamping at both plane is enabled.

v2: 1) Use better coding style (Ian Romanick)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-08-28 12:57:27 -07:00
Sagar Ghuge 2765749e0f mesa: add EXTRA_EXT for AMD_depth_clamp_separate
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-28 12:57:27 -07:00
Sagar Ghuge 2770446740 mesa: add support for GL_AMD_depth_clamp_separate tokens
_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two
tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD.

v2: Remove unnecessary parentheses (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-28 12:57:27 -07:00
Sagar Ghuge 5650d39978 mesa: Add support for AMD_depth_clamp_separate
Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle
GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens.

Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it,
as suggested by Marek Olsak.

Driver that enables AMD_depth_clamp_separate will only ever look at
DepthClampNear and DepthClampFar, as suggested by Ian Romanick.

v2: 1) Remove unnecessary parentheses (Marek Olsak)
    2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE
       GL_DEPTH_CLAMP only (Marek Olsak)
    3) Clamp against near and far plane separately (Marek Olsak)
    4) Clip point separately for near and far Z clipping plane (Marek
       Olsak)

v3: Clamp raster position zw to the range [min(n,f), 0] for near plane
    and [0, max(n,f)] for far plane (Marek Olsak)

v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-28 12:57:27 -07:00
Sagar Ghuge 379949b967 mesa: Add types for AMD_depth_clamp_separate.
Add some basic types and storage for the AMD_depth_clamp_separate
extension.

v2: 1) Drop unnecessary definition (Marek Olsak)
    2) Expose extension in compatibility profile (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-28 12:57:27 -07:00
Sagar Ghuge f663fb5487 glapi: define AMD_depth_clamp_separate
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-28 12:57:27 -07:00
Jason Ekstrand c92a463d23 anv: Claim to support depthBounds for ID games
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-28 13:05:54 -05:00
Jason Ekstrand 8c048af589 anv: Copy the appliation info into the instance
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-28 13:05:54 -05:00
Jason Ekstrand 4ffb575da5 vulkan/alloc: Add a vk_strdup helper
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-28 13:05:54 -05:00
Dylan Baker 7c00db9527 meson: Actually load translation files
Currently we run the script but don't actually load any files, even in a
tarball where they exist.

Fixes: 3218056e0e
       ("meson: Build i965 and dri stack")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-28 08:51:05 -07:00
Caio Marcelo de Oliveira Filho f172a77dd8 nir: Remove outdated comment
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 08:11:03 -07:00
Kevin Rogovin 03ecec9ed2 i965: Add INTEL_fragment_shader_ordering support.
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-08-28 17:15:10 +03:00
Kevin Rogovin 119435c877 mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().

One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-08-28 17:15:10 +03:00
Andrii Simiklit 1b0df8a460 i965/gen6/xfb: handle case where transform feedback is not active
When the SVBI Payload Enable is false I guess the register R1.4
which contains the Maximum Streamed Vertex Buffer Index is filled by zero
and GS stops to write transform feedback when the transform feedback
is not active.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-08-28 15:32:45 +02:00
Rhys Perry 743e11c10b docs: add forgotten features to 18.2.0 release notes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewied-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 18.2: <mesa-stable@lists.freedesktop.org>
2018-08-28 13:50:51 +01:00
Erik Faye-Lund a4e60ccb56 virgl: add debug-switch to output TGSI
This is quite useful for debugging shader-transpiling issues in
virglrenderer.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-08-28 14:13:43 +02:00
Erik Faye-Lund 4ab06cc56e virgl: introduce $VIRGL_DEBUG=verbose
This adds an environment-varaible that can be used for driver-specific
flags, as well as a flag for it to enable verbose output.

While we're at it, quiet some overly chatty debug-output by default.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-08-28 14:13:43 +02:00
Erik Faye-Lund 1b2444dffc virgl: replace fprintf-call with debug_printf
This is the only direct call-site for fprintf in virgl; all other
call-sites call debug_printf instead. So let's follow in style here.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-08-28 14:13:43 +02:00
Erik Faye-Lund 2ebfa90abe virgl: delete commented out fprintf-call
This is just debug-cruft left over. Let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-08-28 14:13:43 +02:00
Guido Günther 9de34b4dde meson: Don't enable any vulkan drivers on arm, aarch64
There's no Vulkan support for arm atm.

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-27 11:32:04 -07:00
Guido Günther 05e2fc6860 meson: Be a bit more helpful when arch or OS is unknown
V2: Add one missing @0@

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-27 11:31:52 -07:00
Sagar Ghuge a1e3305f75 intel/eu: print bytes instead of 32 bit hex value
INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.

v2: Print blank spaces in order to vertically align output of compacted
    instructions hex value with uncompacted instructions hex value.
    (Matt Turner)

v3: Fix line wrap at correct length

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-27 11:07:39 -07:00
Lionel Landwerlin 440a988bd1 intel: decoder: handle 0 sized structs
Gen7.5 has a BLEND_STATE of size 0 which includes a variable length
group. We did not deal with that very well, leading to an endless
loop.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-27 18:33:18 +01:00
Rhys Perry e56e600bd3 nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+
Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.

total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs    : 669901 -> 669373 (-0.08%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21064 (-0.02%)

                local     shared        gpr       inst      bytes
    helped           1           0         152         274         274
      hurt           0           0           0           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 14:23:42 +01:00
Rhys Perry d27c791891 nv50/ir: optimize multiplication by 16-bit immediates into two xmads
Rather than the usual three that would be created.

total instructions in shared programs : 5796385 -> 5786560 (-0.17%)
total gprs used in shared programs    : 670103 -> 669968 (-0.02%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21068 (-0.45%)

                local     shared        gpr       inst      bytes
    helped           1           0          64        1040        1040
      hurt           0           0          27           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:57:11 +01:00
Rhys Perry 400a4eb964 nv50/ir: optimize near power-of-twos into shladd
total instructions in shared programs : 5819319 -> 5796385 (-0.39%)
total gprs used in shared programs    : 670571 -> 670103 (-0.07%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21164 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0         318        1758        1758
      hurt           0           0          63           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:57:01 +01:00
Rhys Perry 2f52925f5c nv50/ir: move a * b -> a << log2(b) code into createMul()
With this commit, OP_MAD is handled on nv50 too. This commit is also
useful for later commits.

Also, instead of creating a shladd, it relies on LateAlgebraicOpt to
create one. This simplifies the code and helps shader-db slightly overall.

total instructions in shared programs : 5820882 -> 5819319 (-0.03%)
total gprs used in shared programs    : 670595 -> 670571 (-0.00%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21164 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0          18         230         230
      hurt           0           0           8         263         263

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:56:47 +01:00
Rhys Perry b60bc7a4ab nv50/ir: optimize imul/imad to xmads
This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.

total instructions in shared programs : 5768871 -> 5820882 (0.90%)
total gprs used in shared programs    : 669919 -> 670595 (0.10%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21164 (0.46%)

                local     shared        gpr       inst      bytes
    helped           0           0          38           0           0
      hurt           1           0         365        3076        3076

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:56:44 +01:00
Rhys Perry bcbcdf8448 gm107/ir: add support for OP_XMAD on GM107+
v4: make the immediate field 16 bits
v5: don't ever emit h1 flags for immediates

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:56:41 +01:00
Rhys Perry 5d6952d2de nv50/ir: add preliminary support for OP_XMAD
v4: remove uint16_t(...)
v4: don't allow immediates outside [0,65535] in insnCanLoad()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:56:36 +01:00
vadym.shovkoplias 4a8444d5bc glsl/linker: Allow unused in blocks which are not declated on previous stage
>From Section 4.3.4 (Inputs) of the GLSL 1.50 spec:

    "Only the input variables that are actually read need to be written
     by the previous stage; it is allowed to have superfluous
     declarations of input variables."

Fixes:
    * interstage-multiple-shader-objects.shader_test

v2:
  Update comment in ir.h since the usage of "used" field
  has been extended.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101247
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-27 12:13:53 +02:00
Jason Ekstrand 07a227f543 nir: Pull block_ends_in_jump into nir.h
We had two different implementations in different files.  May as well
have one and put it in nir.h.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-27 02:15:38 -05:00
Samuel Iglesias Gonsálvez 59a8e0dbf8 anv: Add support for protected memory properties on anv_GetPhysicalDeviceProperties2()
VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1.

Fixes Vulkan CTS CL#2849.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-27 09:07:52 +02:00
Jason Ekstrand aad501f15e intel/tools: Add 0x in front of a couple of hex values
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 18:47:08 -05:00
Jason Ekstrand 76b0e4d8c9 anv: Fill holes in the VF VUE to zero
This fixes a GPU hang in DOOM 2016 running under wine.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104809
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 18:47:08 -05:00
Kai Wasserbäch b2313ef4a8 intel: tools: Fix aubinator_error's fprintf call (format-security)
The recent commit 4616639b49 introduced
the new function aubinator_error() which is a trivial wrapper around
fprintf() to STDERR. The call to fprintf() however is passed the message
msg directly:
  fprintf(stderr, msg);

This is a format-security violation and leads to an FTBFS with
-Werror=format-security (GCC 8):
  ../../../src/intel/tools/aubinator.c: In function 'aubinator_error':
  ../../../src/intel/tools/aubinator.c:74:4: error: format not a string literal and no format arguments [-Werror=format-security]
      fprintf(stderr, msg);
      ^~~~~~~

This patch fixes this trivially by introducing a catch-all "%s" format
argument.

Fixes: 4616639b49 ("intel: tools: split aub parsing from aubinator")
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 16:52:12 +01:00
Jason Ekstrand 70de31d0c1 intel/batch_decoder: Print blend states properly
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 07:50:45 -05:00
Jason Ekstrand cbd4bc1346 intel/batch_decoder: Fix dynamic state printing
Instead of printing addresses like everyone else, we were accidentally
printing the offset from state base address.  Also, state_map is a void
pointer so we were incrementing in bytes instead of dwords and every
state other than the first was wrong.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 07:50:43 -05:00
Jason Ekstrand d1971be6ea intel/decoder: Print ISL formats for vertex elements
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-08-25 07:50:40 -05:00