Commit Graph

92605 Commits

Author SHA1 Message Date
Francisco Jerez e4124f9bc1 glapi: Update XML for last revision of EXT_shader_framebuffer_fetch.
Desktop GL is now supported, and there is an additional entry-point
for EXT_shader_framebuffer_fetch_non_coherent.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-02-24 15:28:36 -08:00
Francisco Jerez 6a8ec78c2a mesa: Rename MESA_shader_framebuffer_fetch gl_extensions bits to EXT.
The changes I had originally planned for the MESA_shader_framebuffer_fetch
extension have been merged into the EXT spec, there's no point in keeping
MESA_shader_framebuffer_fetch extension enables.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-02-24 15:28:36 -08:00
Francisco Jerez d0bef79f12 mesa: Rename dd_function_table::BlendBarrier to match latest EXT spec.
This GL entry point was renamed to glFramebufferFetchBarrier() in the
EXT extension on request from Khronos members.  Update the Mesa
codebase to match the latest spec.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-02-24 15:28:36 -08:00
Francisco Jerez 27c829da28 i965: Fix KHR_blend_equation_advanced with some render targets.
This reverts two bogus and seemingly useless changes from the commits
referenced below, which broke KHR_blend_equation_advanced (and
EXT_shader_framebuffer_fetch_non_coherent which wasn't exposed yet)
for any kind of render target surface that would cause the
get_isl_surf() call in brw_emit_surface_state() to do anything useful
(notice how the result of get_isl_surf() is completely ignored by the
caller right now), as was the case while using those extensions with
1D array or 3D framebuffers in particular.

Fixes: f5859b45b1 "i965/miptree: Switch remaining surfaces to isl"
Fixes: bf24c3539e "i965/miptree: Clean-up unused"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-02-24 15:28:36 -08:00
Marek Olšák fb410ae392 radeonsi: remove si_descriptors parameter from emit_shader_pointer functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:29 +01:00
Marek Olšák 63ea0a00a3 radeonsi: preload the tess offchip ring in TES
so that it's not done multiple times in branches

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:29 +01:00
Marek Olšák 2d03c4cac8 radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRs
TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address
aligned to 512KB. Hey, it's a 13-bit pointer!

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:29 +01:00
Marek Olšák 190e064e63 radeonsi: move 2nd-shader descriptor pointers into s[0:1]
If 32-bit pointers are supported, both pointers can be moved into s[0:1]
and then ESGS has exactly the same user data SGPR declarations as VS.

If 32-bit pointers are not supported, only one pointer can be moved into
s[0:1]. In that case, the 2nd pointer is moved before TCS constants,
so that the location is the same in HS and GS.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:29 +01:00
Marek Olšák 1d1df76d2b radeonsi: change si_descriptors::shader_userdata_offset type to short
We will want to use SH registers outside of user data SGPRs, like the GFX9
special SGPRs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:28 +01:00
Marek Olšák fca7dee9c6 radeonsi: put both tessellation rings into 1 buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:28 +01:00
Marek Olšák d2963d8b5f radeonsi: move tessellation ring info into si_screen
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:28 +01:00
Marek Olšák 41895c26d3 radeonsi: move TCS_OUT_LAYOUT.PatchVerticesIn to lower bits
For a later patch.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:28 +01:00
Karol Herbst f0b39779a0 nvir: dont optimize mad with subops to shladd
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-24 18:48:13 +01:00
James Legg afd8fd0656 radv: Really use correct HTILE expanded words.
When transitioning to an htile compressed depth format, Set the full
depth range, so later rasterization can pass HiZ. Previously, for depth
only formats, the depth range was set to 0 to 0. This caused unwanted
HiZ rejections with a VK_FORMAT_D16_UNORM depth buffer
(VK_FORMAT_D32_SFLOAT was not affected somehow).

These values are derived from PAL [0], since I can't find the
specification describing the htile values.

[0] 5cba4ecbda/src/core/hw/gfxip/gfx9/gfx9MaskRam.cpp (L1500)

CC: Dave Airlie <airlied@redhat.com>
CC: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Fixes: 5158603182 "radv: Use correct HTILE expanded words."
2018-02-24 02:16:22 +01:00
Mauro Rossi 8eed942136 radv/extensions: fix c_vk_version for patch == None
Similar to cb0d1ba156 ("anv/extensions: Fix VkVersion::c_vk_version for patch == None")
fixes the following building errors:

out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_radv_common_intermediates/radv_entrypoints.c:1161:48:
error: use of undeclared identifier 'None'; did you mean 'long'?
      return instance && VK_MAKE_VERSION(1, 0, None) <= core_version;
                                               ^~~~
                                               long
external/mesa/include/vulkan/vulkan.h:34:43: note: expanded from macro 'VK_MAKE_VERSION'
    (((major) << 22) | ((minor) << 12) | (patch))
                                          ^
...
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.

Fixes: e72ad05c1d ("radv: Return NULL for entrypoints when not supported.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-24 00:31:31 +01:00
Eric Anholt b4b4ada761 broadcom/vc5: Fix layout of 3D textures.
Cube maps are entire miptrees repeated, while 3D textures have each level
have all of its layers next to each other.  Fixes tex3d and
tex-miplevel-selection GL2:texture() 3D.
2018-02-23 15:07:26 -08:00
Eric Anholt 97dc077303 broadcom/vc5: Ignore unused usage flags in is_format_supported.
Like for vc4, the new DISPLAY_TARGET flag ended up causing no formats to
match.  Just drop the whole retval == usage thing and return early when we
hit a known unsupported case.

Fixes: f7604d8af5 ("st/dri: only expose config formats that are display targets")
2018-02-23 15:07:18 -08:00
Eric Anholt 880573e737 gbm: Fix the alpha masks in the GBM format table.
Once GBM started looking at the values of the alpha masks, ARGB/ABGR
wouldn't match any more because we had both A and R in the low bits.

Fixes: 2ed344645d ("gbm/dri: Add RGBA masks to GBM format table")
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2018-02-23 15:03:36 -08:00
Mathias Fröhlich b54bf0e3e3 mesa: Update vertex processing mode on _mesa_UseProgram.
The change is a bug fix for 92d76a169:
  mesa: Provide an alternative to get_vp_mode()
that actually got exposed through 4562a7b0:
  vbo: Make use of _DrawVAO from the dlist code.

Fixes: KHR-GLES31.core.shader_image_load_store.advanced-sso-simple
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105229
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 21:08:35 +01:00
Marek Olšák d169438d8e mesa: rename has_core_gs -> has_gs in get_programiv
This is also true for GLES.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 20:50:23 +01:00
Marek Olšák 1881f41b6c mesa: replace some API_OPENGL_CORE checks with _mesa_is_desktop_gl
This is more accurate with respect to the compatibility profile.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 20:50:22 +01:00
Marek Olšák 1defc973db mesa: add some of missing compatibility support for ARB_bindless_texture
The extension is exposed in the compatibility profile.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 20:50:20 +01:00
Marek Olšák b8e2e9e1a1 mesa: expose ARB_enhanced_layouts in the compatibility profile
GLSL 1.40 is required.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 20:50:19 +01:00
Marek Olšák a0c8b49284 mesa: enable OpenGL 3.1 with ARB_compatibility
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 20:50:17 +01:00
Marek Olšák 605a7f6db5 mesa: implement ARB_compatibility
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 20:50:15 +01:00
Emil Velikov 14a2c87c41 swr: remove dead LLVM code paths
LLVM requirement was bumped to 4.0.0 with earlier commit.
Hence any code tailored for older versions is now unreachable.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2018-02-23 19:17:31 +00:00
Eric Anholt 5980a41c0f broadcom/vc4: Remove the retval==usage check in is_format_supported().
This got us into trouble recently, so just remove it entirely.
2018-02-23 08:42:13 -08:00
Eric Anholt bc3d16e633 broadcom/vc4: Add support for YUV textures using unaccelerated blits.
Previously we would assertion fail about having no hardware format.  This
is enough to get kmscube -M nv12-2img working.
2018-02-23 08:42:13 -08:00
Eric Anholt c824a045ea broadcom/vc4: Fix double-unrefcounting of prsc->next with shadows.
When we set up the shadow resource we were copying the original resource
as the template, including its prsc->next field.  When we shadowed the
first YUV plane's resource for linear-to-tiled conversion, we would end up
unbalancing the refcount on the shadow resource's destruction.
2018-02-23 08:42:13 -08:00
Eric Anholt 6deb158ec1 broadcom/vc4: Add pipe_reference debugging for vc4_bos.
Trying to track down the YUV EGLImage use-after-free, it helps to see what
the mystery objects are that are being refcounted.
2018-02-23 08:42:13 -08:00
Eric Anholt 34ea1aca92 broadcom/vc4: Remove dead vc4_bo_set_reference().
It would be broken if NULL was passed to it anyway, since it wouldn't
participate in screen->bo_handles management.
2018-02-23 08:42:13 -08:00
Eric Anholt a49738290c broadcom/vc4: Use pipe_resource_reference in sampler views.
Improves u_debug_refcount output.
2018-02-23 08:42:13 -08:00
Eric Anholt 0c1dd9dee0 broadcom/vc4: Allow importing linear BOs with arbitrary offset/stride.
This is part of supporting YUV textures -- MMAL will be handing us a
single GEM BO with the planes at offsets within it, and MMAL-decided
stride.
2018-02-23 08:42:13 -08:00
Eric Anholt 978b884afc broadcom/vc4: Ignore PIPE_BIND_DISPLAY_TARGET in is_format_supported().
We were failing the retval == usage check at the end.

Fixes: f7604d8af5 ("st/dri: only expose config formats that are display targets")
2018-02-23 08:42:13 -08:00
Lucas Stach 8df11f3fad etnaviv: fix in-place resolve tile count
TS tiles map to a fixed amount of bytes in the color/depth surface,
so the blocksize of the format needs to be taken into account when
calculating the number of tiles to fill.

The simplest fix is to just use the layer stride, which is the surface
size in bytes.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2018-02-23 15:34:39 +01:00
Lucas Stach add23b59c9 etnaviv: switch magic single buffer state to "3"
Some of the 16bit formats misrender with missing tiles with the current
"2" state. As all the previously working formats also work with the "3"
state, just always use that one.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2018-02-23 15:34:39 +01:00
Lucas Stach 8befc11186 etnaviv: add debug switch to disable single buffer feature
This feature has caused some trouble already. Add a debug switch to
allow users to quickly check if a specific issue is caused by this
feature.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2018-02-23 15:34:31 +01:00
Dylan Baker 5c460337fd meson: Fix GL and EGL pkg-config files with glvnd
Currently meson will generate a pkg-config that links to EGL_mesa (or
GLX_mesa), but this isn't correct, it should always link to EGL or GL.
Probably the "right" solution is to have glvnd itself provide the pkg
config files for GL and EGL, but that also means that glvnd needs to
provide many of the header files, which makes it a more involved job.

Fixes: a47c525f32 ("meson: build glx")
Fixes: 035ec7a2bb ("meson: Add support for EGL glvnd")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2018-02-23 13:30:28 +00:00
Frank Binns 6160bf97db egl/dri2: fix segfault when display initialisation fails
dri2_display_destroy() is called when platform specific display
initialisation fails. However, this would typically lead to a
segfault due to the dri2_egl_display vbtl not having been set up.

Fixes: 2db9548296 ("loader_dri3/glx/egl: Optionally use a blit
context for blitting operations")
Signed-off-by: Frank Binns <francisbinns@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-23 11:13:22 +00:00
Juan A. Suarez Romero e1623b303c mesa: add missing RGB9_E5 format in _mesa_base_fbo_format
RGB9_E5 should be accepted by RenderbufferStorage if the
EXT_texture_shared_exponent is exposed. It is left to the
implementations to return GL_FRAMEBUFFER_UNSUPPORTED_EXT
when checking the framebuffer completeness if they do not
support rendering in this format.

Discussed in:
https://github.com/KhronosGroup/OpenGL-API/issues/32

This fixes KHR-GL45.internalformat.renderbuffer.rgb9_e5

v2: Added more info to the commit message (Antia)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
2018-02-23 10:12:06 +01:00
Christian Gmeiner e72062b66d etnaviv: npot_tex_any_wrap needs one bit only
Reduces size of struct etna_specs from 100 to 94 bytes.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2018-02-23 09:38:16 +01:00
Mathias Fröhlich 4562a7b0e8 vbo: Make use of _DrawVAO from the dlist code.
Finally use an internal VAO to execute display list draws. Avoid
duplicate state validation for display list draws. Remove client arrays
previously used exclusively for display lists.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:34:14 +01:00
Mathias Fröhlich 2f35140846 mesa: Use atomics for shared VAO reference counts.
VAOs will be used in the next change as immutable object across multiple
contexts. Only reference counting may write concurrently on the VAO. So,
make the reference count thread safe for those and only those VAO objects.

v3: Use bool/true/false for gl_vertex_array_object::SharedAndImmutable.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:34:11 +01:00
Mathias Fröhlich 8a3a4b6fae vbo: Make use of _DrawVAO from immediate mode draw
Finally use an internal VAO to execute immediate mode draws. Avoid
duplicate state validation for immediate mode draws. Remove client arrays
previously used exclusively for immediate mode draws.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:34:07 +01:00
Mathias Fröhlich c757e416ce vbo: Implement tool functions for vbo specific VAO setup.
Correct VBO_MATERIAL_SHIFT value.
The functions will be used next in this series.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:34:04 +01:00
Mathias Fröhlich ef8028017d mesa: Add flush_vertices to _mesa_bind_vertex_buffer.
We will need the flush_vertices argument later in this series.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:34:01 +01:00
Mathias Fröhlich 354b76ad20 mesa: Make _mesa_vertex_attrib_binding public.
Change vertex_attrib_binding() to _mesa_vertex_attrib_binding(), add a
flush_vertices argument, and make it publicly available.
The function will be needed later in the series.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:33:58 +01:00
Mathias Fröhlich 4331969ac4 mesa: Add flush_vertices to _mesa_{enable,disable}_vertex_array_attrib.
We will need the flush_vertices argument later in this series.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:33:55 +01:00
Mathias Fröhlich 195bb990ed vbo: Use _DrawVAO for array type draw commands.
Switch over to use the _DrawVAO for all the array type draws.
The _DrawVAO needs to be set before we enter _mesa_update_state, so move
setting the draw method in front of the first call to _mesa_update_state
which is in turn called from the *validate*Draw* calls. Using the
gl_vertex_array_object::_Enabled bitmask, gl_vertex_program_state::_VPMode
and gl_vertex_array_object::_AttributeMapMode we can already set
varying_vp_inputs before we call _mesa_update_state the first time.
Thus remove duplicate state validation.

v2: Update comments.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:33:50 +01:00
Mathias Fröhlich 6002ab564b vbo: Implement method to track the inputs array.
Provided the _DrawVAO and the derived state that is maintained if we have
the _DrawVAO set, implement a method to incrementally update the array of
gl_vertex_array input pointers.

v2: Add some more comments.
    Rename _vbo_array_init to _vbo_init_inputs.
    Rename vbo_context::arrays to vbo_context::draw_arrays.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:33:46 +01:00
Mathias Fröhlich 08c7474189 mesa: Introduce a yet unused _DrawVAO.
During the patch series this VAO gets populated with either the currently
bound VAO or an internal VAO that will be used for immediate mode and
dlist rendering.

v2: More comments about the _DrawVAO, filter and enabled mask.
    Rename _DrawVAOEnabled to _DrawVAOEnabledAttribs.
v3: Fix and move comment.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:33:43 +01:00
Mathias Fröhlich ce3d2421a0 vbo: Remove get_vp_mode() and enum vp_mode.
Is now unused.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:33:40 +01:00
Mathias Fröhlich 60c3ca1b23 vbo: Use _VPMode instead of get_vp_mode().
At those places where we used get_vp_mode() use
gl_vertex_program_state::_VPMode instead.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:33:36 +01:00
Mathias Fröhlich 92d76a1691 mesa: Provide an alternative to get_vp_mode()
To get equivalent information than get_vp_mode(), track the vertex
processing mode in a per context variable at
gl_vertex_program_state::_VPMode.
This aims to replace get_vp_mode() as seen in the vbo module.
But instead of the get_vp_mode() implementation which only gives correct
answers past calling _mesa_update_state() this context variable is
immediately tracked when the vertex processing state is modified. The
correctness of this value is asserted on state validation.

With this in place we should be able to untangle the dependency with
varying_vp_inputs and state invalidation.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-23 05:33:30 +01:00
Ilia Mirkin d73f1f2ad8 nv50,nvc0: fix integer MS resolves using 2d engine
We don't want filtering for integer textures, same as depth/stencil.

Fixes: KHR-GL45.direct_state_access.renderbuffers_storage_multisample
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
2018-02-22 20:47:48 -05:00
Ilia Mirkin 33ce3569c5 nvc0: fix writing query results into buffer
We need to mark the range as valid, and validate the resource using a
helper to ensure that the buffer status is marked properly.

Fixes some CTS pipeline stats query tests, and
KHR-GL45.direct_state_access.queries_functional

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
2018-02-22 20:47:48 -05:00
Ilia Mirkin f6e4f95668 nv50,nvc0: fix clear buffer acceleration
Two things were off:
 - valid range was not updated, which could affect waiting for future
   maps
 - fencing was done manually instead of using the *_resource_validate
   helper, which resulted in a missed dirty buffer flag being set

Fixes: KHR-GL45.direct_state_access.buffers_clear
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
2018-02-22 20:47:48 -05:00
Lionel Landwerlin bd9672695b i965: perf: ensure reading config IDs from sysfs isn't interrupted
Fixes: 458468c136 "i965: Expose OA counters via INTEL_performance_query"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2018-02-23 01:44:07 +00:00
Bas Nieuwenhuizen 032870beda radv: Fix autotools build.
Somewhere along the way the Makefile changes got lost ...

Fixes: 4db78f3a6b "radv: Put supported extensions in a struct."
Acked-by: Dave Airlie <airlied@redhat.com>
2018-02-23 01:54:12 +01:00
Bas Nieuwenhuizen e72ad05c1d radv: Return NULL for entrypoints when not supported.
This implements strict checking for the entrypoint ProcAddr
functions.

 - InstanceProcAddr with instance = NULL, only returns the 3 allowed
   entrypoints.
 - DeviceProcAddr does not return any instance entrypoints.
 - InstanceProcAddr does not return non-supported or disabled
   instance entrypoints.
 - DeviceProcAddr does not return non-supported or disabled device
   entrypoints.
 - InstanceProcAddr still returns non-supported device entrypoints.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-23 00:39:02 +01:00
Bas Nieuwenhuizen 414f5e0e14 radv: Reword radv_entrypoints_gen.py
With a big inspiration from anv as always ...

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-23 00:39:02 +01:00
Bas Nieuwenhuizen 076f7cfc6b radv: Track enabled extensions.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-23 00:39:02 +01:00
Bas Nieuwenhuizen 4db78f3a6b radv: Put supported extensions in a struct.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-23 00:39:02 +01:00
Samuel Pitoiset d6b7539206 ac/nir: remove emission of nir_op_fpow
fpow is now lowered at NIR level.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:44:46 +01:00
Samuel Pitoiset 7aa008d1d7 radv: enable lowering of fpow to fexp2 and flog2
There is no fpow in hardware, so it's always lowered somewhere,
but it appears that lowering at NIR level is better. Figured while
comparing compute shaders between RadeonSI and RADV.

Polaris10:
Totals from affected shaders:
SGPRS: 18936 -> 18904 (-0.17 %)
VGPRS: 12240 -> 12220 (-0.16 %)
Spilled SGPRs: 2809 -> 2809 (0.00 %)
Code Size: 718116 -> 719848 (0.24 %) bytes
Max Waves: 1409 -> 1410 (0.07 %)

Vega10:
Totals from affected shaders:
SGPRS: 18392 -> 18392 (0.00 %)
VGPRS: 12008 -> 11920 (-0.73 %)
Spilled SGPRs: 3001 -> 2981 (-0.67 %)
Code Size: 777444 -> 778788 (0.17 %) bytes
Max Waves: 1503 -> 1504 (0.07 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:40:47 +01:00
Samuel Pitoiset 63fb30c674 nir: lower fexp2(fmul(flog2(a), 2)) to fmul(a, a)
Similar for the 4 case.

Suggested by Bas.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:40:45 +01:00
Samuel Pitoiset b18997876f nir: add is_used_once for fmul(fexp2(a), fexp2(b)) to fexp2(fadd(a, b))
Otherwise the code size increases because the original fexp2()
instructions can't be deleted.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:40:43 +01:00
Samuel Pitoiset a01e9996b5 ac/nir: set GLC=1 for load/store of coherent/volatile images
This disables persistence accross wavefronts.

F1 2017 and Wolfenstein 2 appear to use some coherent images
but this patch doesn't seem to change anything.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:39:55 +01:00
Samuel Pitoiset 3c40be126f spirv: apply memory qualifiers to images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:39:53 +01:00
Chuck Atkins 540e49e105 glx: Properly handle cases where screen creation fails
This fixes a segfault exposed by a29d63ecf7 which occurs when swr is
used on an unsupported architecture.

v2: re-work to place logic in xmesa_init_display

Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: George Kyriazis <george.kyriazis@intel.com>
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-22 10:20:32 -05:00
Iago Toral Quiroga 7668b594e6 anv/blorp: multisample resolve all attachment layers
We were only resolving the first.

v2:
  - Do not require that the number of layers on dst and src are an
    exact match, it is okay if the dst has more layers so long as
    it has at least the same that we are going to resolve.
  - Do not always resolve array_len layers, we should resolve
    only from base_array_layer to array_len.

v3:
  - v2 was assuming that array_len represented the total number of
    layers in the image, but it represents the number of layers
    starting at the base array ayer.

v4:
 - The number of layers to resolve should be taken from the
   framebuffer (Nanley).

Fixes new CTS tests for multisampled layered rendering:
dEQP-VK.renderpass.multisample_resolve.layers_*

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-22 08:23:39 +01:00
Jason Ekstrand 2dce4ac6ac intel/isl: Improve the documentation on get_default_aux_state
Reviewed-by: Chad Versace <chadversary@chromium.org>
2018-02-21 18:18:16 -08:00
Jason Ekstrand 24952160fd i965: Use finish_external instead of make_shareable in setTexBuffer2
The setTexBuffer2 hook from GLX is used to implement glxBindTexImageEXT
which has tighter restrictions than just "it's shared".  In particular,
it says that any rendering to the image while it is bound causes the
contents to become undefined.

The GLX_EXT_texture_from_pixmap extension provides us with an acquire
and release in the form of glXBindTexImageEXT and glXReleaseTexImageEXT.
The extension spec says,

    "Rendering to the drawable while it is bound to a texture will leave
    the contents of the texture in an undefined state.  However, no
    synchronization between rendering and texturing is done by GLX.  It
    is the application's responsibility to implement any synchronization
    required."

From the EGL 1.4 spec for eglBindTexImage:

    "After eglBindTexImage is called, the specified surface is no longer
    available for reading or writing.  Any read operation, such as
    glReadPixels or eglCopyBuffers, which reads values from any of the
    surface’s color buffers or ancillary buffers will produce
    indeterminate results.  In addition, draw operations that are done
    to the surface before its color buffer is released from the texture
    produce indeterminate results

In other words, between the bind and release calls, we effectively own
those pixels and can assume, so long as we don't crash, that no one else
is reading from/writing to the surface.  The GLX and EGL implementations
call the setTexBuffer2 and releaseTexBuffer function pointers that the
driver can hook.

In theory, this means that, between BindTexImage and ReleaseTexImage, we
own the pixels and it should be safe to track aux usage so we
can avoid redundant resolves so long as we start off with the right
assumption at the start of the bind/release pair.

In practice, however, X11 has slightly different expectations.  It's
expected that the server may be drawing to the image at the same time as
the compositor is texturing from it.  In that case, the worst expected
outcome should be tearing or partial rendering and not random corruption
like we see when rendering races with scanout with CCS.  Fortunately,
the GEM rules about texture/render dependencies save us here.  If X11
submits work to write to a pixmap after the compositor has submitted
work to texture from it, GEM inserts a dependency between the compositor
and X11.  If X11 is using a high-priority context, this will cause the
compositor to get a temporarily boosted priority while the batch from
X11 is waiting on it.  This means that we will never have an actual race
between X11 and the compositor so no corruption can happen.

Unfortunately, however, this means that X11 will likely be rendering to it
between the compositor's BindTexImage and ReleaseTexImage calls.  If we
want to avoid strange issues, we need to be a bit careful about
resolves because we can't really transition it away from the "default"
aux usage.  The only case where this would practically be a problem is
with image_load_store where we have to do a full resolve in order to use
the image via the data port.  Even there it would only be a problem if
batches were split such that X11's rendering happens between the resolve
and the use of it as a storage image.  However, the chances of this
happening are very slim so we just emit a warning and hope for the best.

This commit adds a new helper intel_miptree_finish_external which resets
all aux state to whatever ISL says is the right worst-case "default" for
the given modifier.  It feels a little awkward to call it "finish"
because it's actually an acquire from the perspective of the driver, but
it matches the semantics of the other prepare/finish functions.  This
new helper gets called in intelSetTexBuffer2 instead of make_shareable.
We also add an intelReleaseTexBuffer (we passed NULL to releaseTexBuffer
before) and call intel_miptree_prepare_external in it.  This probably
does nothing most of the time but it means that the prepare/finish calls
are properly matched.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2018-02-21 18:18:16 -08:00
Jason Ekstrand 00926a2730 i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2
The old code made a new miptree that referenced the same BO as the
renderbuffer and just trusted in the memory aliasing to work.  There are
only two ways in which the new miptree is liable to differ from the one
in the renderbuffer and neither of them matter:

 1) It may have a different target.  The only targets that we can ever
    see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE
    and the difference between the two doesn't matter as far as the
    miptree is concerned; genX(update_sampler_state) only looks at the
    gl_texture_object and not the miptree when determining whether or
    not to use normalized coordinates.

 2) It may have a very slightly different format.  Again, this doesn't
    matter because we've supported texture views for quite some time so
    we always look at the gl_texture_object format instead of the
    miptree format for hardware setup anyway.

On the other hand, because we were recreating the miptree, we were using
intel_miptree_create_for_bo which doesn't understand modifiers.  We
really want this function to work without doing a resolve so long as you
have modifiers so we need to fix that.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2018-02-21 18:18:16 -08:00
Jason Ekstrand 41d45eb21e i965/tex_image: Pull the tex format from the renderbuffer in intelSetTexBuffer2
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2018-02-21 18:18:16 -08:00
Jason Ekstrand 344b57b10b i965/miptree: Loosen the format check in miptree_match_image
This function is used to determine when we need to re-allocate a
miptree.  Since we do nothing different in miptree allocation for
sRGB vs. linear, loosening this should be safe and may lead to less
copying and reallocating in some odd cases.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2018-02-21 18:18:16 -08:00
Jason Ekstrand 5b1b710e6f i965/state: Ignore intel_obj->_Format for depth/stencil and ETC2
We're about to start letting the intel_obj->_Format be the "real"
texture format.  For depth/stencil textures, this may be a combined
depth stencil format.  For ETC2 on gen7 and earlier, this will be the
actual ETC2 format.  This makes a bit more GL sense but means we have to
be careful in state upload.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2018-02-21 18:18:16 -08:00
Kenneth Graunke 183ce5e629 glsl: Parse 'layout' as a token with advanced blending or bindless
Both KHR_blend_equation_advanced and ARB_bindless_texture provide
layout qualifiers, and are exposed in compatibility contexts.  We
need to parse the layout qualifier as a token in order for those
to work, but forgot to extend this check.

ARB_shader_image_load_store would need a similar treatment, but we
don't expose that in legacy OpenGL contexts.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105161
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-02-21 17:50:57 -08:00
Daniel Stone c7e22483fe vulkan/wsi/x11: Consistently update and return swapchain status
Use a helper function for updating the swapchain status. This will be
used later to handle VK_SUBOPTIMAL_KHR, where we need to make a
non-error status stick to the swapchain until recreation.  Instead of
direct comparisons to VK_SUCCESS to check for error, test for negative
numbers meaning an error status, and positive numbers indicating
non-error statuses.

v2 (Jason Ekstrand):
 - Use a pattern of "return x11_swapchain_result(chain, VK_WHATEVER)"
 - Handle wsi_queue_pull returning VK_TIMEOUT
 - Call x11_swapchain_result in x11_present_to_x11

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-21 22:37:10 +00:00
Jason Ekstrand 6937c61324 vulkan/wsi/x11: Set OUT_OF_DATE if wait_for_special_event fails
This most likely means we lost our connection to the X server so
OUT_OF_DATE is reasonable.  This was also the one case where we pushed a
UINT32_MAX into the queue without setting an error condition.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Stone <daniels@collabora.com>
2018-02-21 22:37:10 +00:00
Daniel Stone bfa22266cd vulkan/wsi/wayland: Add support for zwp_dmabuf
zwp_linux_dmabuf_v1 lets us use multi-planar images and buffer
modifiers.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-21 22:37:10 +00:00
Jason Ekstrand c757fd2852 anv/image: Add support for modifiers for WSI
This adds support for the modifiers portion of the WSI "extension".

Reviewed-by: Daniel Stone <daniels@collabora.com>
2018-02-21 22:37:10 +00:00
Jason Ekstrand adca1e4a92 anv/image: Separate modifiers from legacy scanout
For a bit there, we had a bug in i965 where it ignored the tiling of the
modifier and used the one from the BO instead.  At one point, we though
this was best fixed by setting a tiling from Vulkan.  However, we've
decided that i965 was just doing the wrong thing and have fixed it as of
5048572352.

The old assumptions also affected the solution we used for legacy
scanout in Vulkan.  Instead of treating it specially, we just treated it
like a modifier like we do in GL.  This commit goes back to making it
it's own thing so that it's clear in the driver when we're using
modifiers and when we're using legacy paths.

v2 (Jason Ekstrand):
 - Rename legacy_scanout to needs_set_tiling

Reviewed-by: Daniel Stone <daniels@collabora.com>
2018-02-21 22:37:10 +00:00
Jason Ekstrand f5433e4d6c vulkan/wsi: Add modifiers support to wsi_create_native_image
This involves extending our fake extension a bit to allow for additional
querying and passing of modifier information.  The added bits are
intended to look a lot like the draft of VK_EXT_image_drm_format_modifier.
Once the extension gets finalized, we'll simply transition all of the
structs used in wsi_common to the real extension structs.

Reviewed-by: Daniel Stone <daniels@collabora.com>
2018-02-21 22:37:10 +00:00
Daniel Stone 55b27e1e5f vulkan/wsi: Add drm_modifier member to wsi_image
Not yet used anywhere.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-21 22:37:10 +00:00
Daniel Stone 61c3feb38d vulkan/wsi: Add multiple planes to wsi_image
Not currently used.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-21 22:37:10 +00:00
Timothy Arceri cdeac00267 nir: remove old assert
This was originally intended to make sure the remap location
was not -1. However the code has changed alot since then,
the location is now never set to -1 and we also handle
components meaning this old assert has been doing comparisions
with the pointer to the array of component data.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105183
2018-02-22 09:31:00 +11:00
Timothy Arceri 86098696fc radeonsi/nir: collect more accurate output_usagemask
Fixes assert in the glsl-1.50-gs-max-output-components piglit test.

Note that the double handling will only work for doubles that
don't take up multiple slots i.e. double and dvec2. However
dual slot double handling is an existing bug which is made no
worse by this patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-22 09:31:00 +11:00
Timothy Arceri 79dc94828a radeonsi/nir: disable GLSL IR loop unrolling
Delaying unrolling and allowing NIR to do it instead has been shown
to result in better code in drivers such as i965. shader-db results
appear to show the same is true for radeonsi.

The other advantage is that using NIR unrolling improves compile
times significantly.

Totals from affected shaders:
SGPRS: 9624 -> 10016 (4.07 %)
VGPRS: 6800 -> 6464 (-4.94 %)
Spilled SGPRs: 0 -> 2 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 359176 -> 332264 (-7.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1355 -> 1432 (5.68 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-22 09:31:00 +11:00
Timothy Arceri e6269ffc2e radeonsi/nir: fix tess varying loads for doubles
Fixes the following piglit tests:

tests/spec/arb_tessellation_shader/execution/double-array-vs-tcs-tes.shader_test
tests/spec/arb_tessellation_shader/execution/double-vs-tcs-tes.shader_test

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-22 09:31:00 +11:00
Timothy Arceri 6d338d757f ac/radeonsi: pass type to load_tess_varyings()
We need this to be able to load 64bit varyings.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-22 09:31:00 +11:00
Daniel Stone eef890b7b1 x11/dri3: Store raw present completion mode
The DRI3 drawable info struct currently stores a boolean for whether the
last completed operation was a flip or not. As we need to track the full
completion mode for handling suboptimal returns, change the 'flipping'
field to the raw present completion mode from the server.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2018-02-21 21:57:38 +00:00
Daniel Stone a6f1952814 x11/dri3: Don't open-code ARRAY_SIZE
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-21 21:57:38 +00:00
Jason Ekstrand 52056206e1 anv: Don't assert that stencil HiZ clears are single-slice
It's true for depth HiZ clears because we only have HiZ on single-slice
images right now.  However, for stencil-only clears there is no such
restriction.

Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-21 13:54:11 -08:00
Jason Ekstrand 7dd0f73fe1 anv: Only copy clear dwords if we're rendering to the first slice
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-02-21 12:47:17 -08:00
Marek Olšák b494ed168c radeonsi: don't flush when si_eliminate_fast_color_clear is no-op 2018-02-21 20:03:11 +01:00
Marek Olšák 5f55f4c59f radeonsi: make texture_discard_cmask/eliminate functions non-static 2018-02-21 20:03:11 +01:00
James Zhu 81dd4a7637 radeonsi: enable uvd encode for HEVC main
Enable UVD encode for HEVC main profile

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2018-02-21 13:53:38 -05:00
James Zhu b38b208ff8 radeonsi:create uvd hevc enc entry
Add UVD hevc encode pipe video codec creation entry

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2018-02-21 13:53:38 -05:00
James Zhu e7d51e27ed radeon/uvd:add uvd hevc enc functions
Implement UVD hevc encode functions

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2018-02-21 13:53:38 -05:00
James Zhu 2b86f5fa0b radeon/uvd:add uvd hevc enc hw ib implementation
Implement required IBs for UVD HEVC encode.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2018-02-21 13:53:38 -05:00
James Zhu 461508c15c radeon/uvd:add uvd hevc enc hw interface header
Add hevc encode hardware interface for UVD

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2018-02-21 13:53:38 -05:00
James Zhu c6acae22c8 winsys/amdgpu:add uvd hevc enc support in amdgpu cs
Support UVD HEVC encode in amdgpu cs

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2018-02-21 13:53:38 -05:00
James Zhu f0ad908e79 amd/common:add uvd hevc enc support check in hw query
Based on amdgpu hardware query information to check if UVD hevc enc support

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-21 13:53:38 -05:00
Karol Herbst 7319311a50 nvir/nvc0: fix legalizing of ld unlock c0[0x10000]
We have to increase the file index also for 0x10000 not just for values
greater than 0x10000.

Fixes: 37b67db6ae
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-21 11:12:45 +01:00
Samuel Pitoiset a6accad68f ac/nir: add glsl_is_array_image() helper
For consistency.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-21 09:41:51 +01:00
Samuel Pitoiset ff83dfb364 ac/nir: set the DA field when performing atomics on 3D images
This doesn't fix anything known but it should definitely be set.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-21 09:41:49 +01:00
Eric Anholt afa7b2f199 i965: Fix compiler warning about write being undefined.
This looks like it should be protected by the assume() about
nr_color_regions, but my compiler warns anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-02-20 20:23:57 -08:00
Eric Anholt 4636ce362d glsl/tests: Fix a compiler warning about signed/unsigned loop comparison.
Fixes: d32956935e ("glsl: Walk a list of ir_dereference_array to mark array elements as accessed")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-02-20 20:23:57 -08:00
Eric Anholt 7075c084fc loader: Fix compiler warnings about truncating the PCI ID path.
My build was producing:

../src/loader/loader.c:121:67: warning: ‘%1u’ directive output may be truncated writing between 1 and 3 bytes into a region of size 2 [-Wformat-truncation=]

and we can avoid this careful calculation by just using asprintf (as we do
elsewhere in the file).

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-02-20 20:23:57 -08:00
Eric Anholt 1b313eedb5 glsl: Silence warnings in the uniform initializer test about 16-bit types
They should probably get unit tests implemented, but this cleans up a
bunch of warnings in my build for now.

Fixes: 59f458cd87 ("glsl: Add 16-bit types")
Cc: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-02-20 20:23:57 -08:00
Jordan Justen 96fe36f7ac i965: Enable disk shader cache by default
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-02-20 18:49:43 -08:00
Dave Airlie baa0feb73d radv: don't send num_tcs_input_cp to sgprs.
We never use it in the shaders.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-21 00:01:36 +00:00
Dave Airlie 952222ddd4 radv/tess: don't need to look in constant for vertices_per_patch
This just avoids passing this value via user sgprs.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-21 00:01:28 +00:00
Dave Airlie 77fd1b9187 ac/radv: cleanup some tcs output values access
Just consolidates some code to make it easier to change.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-21 00:01:23 +00:00
Dave Airlie 0e6f0d400b ac/radv: remove total_vertices variable
This just removes an unneeded variable.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-21 00:01:19 +00:00
Dave Airlie e9b9fb3616 ac/radv: don't mark tess inner as used if we don't use it.
This just avoids marking it as a used output if we don't
actually use it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-21 00:01:15 +00:00
Dave Airlie d5b2d7ed67 ac/nir: to integer the args to bcsel.
dEQP-VK.tessellation.invariance.outer_edge_symmetry.triangles_equal_spacing_ccw
was hitting an llvm assert due to one value being an int and the
other a float.

This just casts both values to integer and fixes the test.

Fixes: dEQP-VK.tessellation.invariance.outer_edge_symmetry.triangles_equal_spacing_ccw
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-20 23:15:18 +00:00
Jason Ekstrand c66fb12117 anv/blorp: Use layout_to_aux_usage when a layout is provided
Instead of having aux usage and ANV_AUX_USAGE_DEFAULT to mean "give me
something reasonable" we now use anv_layout_to_aux_usage whenever a
layout is available.  If a layout is available, we ignore the aux_usage
parameter.  For the cases where we have an explicit aux usage such as
clears and aux ops, we have a new ANV_IMAGE_LAYOUT_EXPLICIT_AUX layout.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-20 13:57:17 -08:00
Jason Ekstrand 0fa040e6f5 anv/cmd_buffer: Delete some assert-only variables
Checking the sample count is almost as good as aux usage in this case.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-20 13:57:16 -08:00
Jason Ekstrand e10a62662b anv/cmd_buffer: Use layout_to_* helpers in compute_aux_usage
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-20 13:57:14 -08:00
Jason Ekstrand 7ea8131aa0 anv/cmd_buffer: Simplify transition_depth_buffer
If we don't have HiZ, then anv_layout_to_aux_usage will return NONE for
both layouts.  If the two layouts are the same, they will get the aux
usage.  In either case, the code below will give us ISL_AUX_OP_NONE and
we'll return without doing anything.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-20 13:57:09 -08:00
Jason Ekstrand 87e86ee2e6 anv/cmd_buffer: Do subpass image transitions in begin/end_subpass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:25 -08:00
Jason Ekstrand 7d5f6b6088 anv/cmd_buffer: Mark depth/stencil surfaces written in begin_subpass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:25 -08:00
Jason Ekstrand 8a3f086a42 anv/cmd_buffer: Sync clear values in begin_subpass
This is quite a bit cleaner because we now sync the clear values at the
same time as we do the fast clear.  For loading the clear values into
the surface state, we now do it once when we handle the LOAD_OP_LOAD
instead of every subpass.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:25 -08:00
Jason Ekstrand a4136b8c1a anv/pass: Store usage in each subpass attachment
This requires us to ditch the VkAttachmentReference struct in favor of
an anv-specific struct.  However, we can now easily identify from just
the subpass attachment what kind of an attachment it is.  This will make
iteration over anv_subpass::attachments a little easier in some case.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:25 -08:00
Jason Ekstrand bd356e1bcf anv/cmd_buffer: Add a concept of pending load aspects
These are the same as pending clear aspects only for the "load"
operation.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:25 -08:00
Jason Ekstrand e526d49edd anv/cmd_buffer: Iterate all subpass attachments when clearing
This unifies things a bit because we now handle depth and stencil at the
same time.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:25 -08:00
Jason Ekstrand 2cc3445eb2 anv/cmd_buffer: Decide whether or not to HiZ clear up-front
This moves the decision out of begin_subpass and into BeginRenderPass
like the decision for color clears.  We use a similar name for the
function for depth/stencil as for color even though no aux usage is
really getting computed.

v2 (Jason Ekstrand):
 - Don't always disable HiZ clears by accident
 - Use the initial layout to decide whether to do fast clears

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 6fc8555610 anv/cmd_buffer: Move the rest of clear_subpass into begin_subpass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 7991838973 intel/blorp: Add a blorp_hiz_clear_depth_stencil helper
This is similar to blorp_gen8_hiz_clear_attachments except that it takes
actual images instead of trusting in the already set depth state.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 1900dd76d0 anv/cmd_buffer: Move the color portion of clear_subpass into begin_subpass
This doesn't really change much now but it will give us more/better
control over clears in the future.  The one interesting functional
change here is that we are now re-emitting 3DSTATE_DEPTH_BUFFERS and
friends for each clear.  However, this only happens at begin_subpass
time so it shouldn't be substantially more expensive.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 6fb9d6c6f5 anv/cmd_buffer: Pass a subpass id into begin_subpass
This is a bit less awkward than passing in the subpass because it means
we don't have to extract the subpass id from the subpass.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 01223b8199 anv/cmd_buffer: Add begin/end_subpass helpers
Having begin/end_subpass is a bit nicer than the begin/next/end hooks
that Vulkan gives us.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand b5bd3fb4e4 anv/cmd_buffer: Apply subpass flushes before set_subpass
This seems slightly more correct because it means that the flushes
happen before any clears or resolves implied by the subpass transition.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 869448a8ab anv: Use framebuffer layers for implicit subpass transitions
Fixes: de3be61801 "anv/cmd_buffer: Rework aux tracking"
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 85d0bec961 anv: Be more careful about fast-clear colors
Previously, we just used all the channels regardless of the format.
This is less than ideal because some channels may have undefined values
and this should be ok from the client's perspective.  Even though the
driver should do the correct thing regardless of what is in the
undefined value, it makes things less deterministic.  In particular, the
driver may choose to fast-clear or not based on undefined values.  This
level of nondeterminism is bad.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 4796025ba5 intel/isl: Add an isl_color_value_is_zero helper
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:24 -08:00
Jason Ekstrand 116e818ef1 anv/gpu_memcpy: CS Stall before a MI memcpy on gen7
This fixes a pile of hangs caused by the recent shuffling of resolves
and transitions.  The particularly problematic case is when you have at
least three attachments with load ops of CLEAR, LOAD, CLEAR.  In this
case, we execute the first CLEAR followed by a MI memcpy to copy the
clear values over for the LOAD followed by a second CLEAR.  The MI
commands cause the first CLEAR to hang which causes us to get stuck on
the 3DSTATE_MULTISAMPLE in the second CLEAR.

We also add guards for BLORP to fix the same issue.  These shouldn't
actually do anything right now because the only use of indirect clears
in BLORP today is for resolves which are already guarded by a render
cache flush and CS stall.  However, this will guard us against potential
issues in the future.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-20 13:49:19 -08:00
Guillaume Charifi a572ec2efe st/mesa: Factorize duplicate code for atomic buffer binding
Signed-off-by: Guillaume Charifi <guillaume.charifi@sfr.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-02-20 20:54:49 +01:00
Guillaume Charifi 56bfcd50f7 st/mesa: Factorize duplicate code in st_update_framebuffer_state()
Signed-off-by: Guillaume Charifi <guillaume.charifi@sfr.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-02-20 20:54:49 +01:00
Rob Clark 4c4e6232ee freedreno/ir3: fix use_count refcnt'ing issue
Was hitting an assert with vs-varying-array-mat4-index-col-row-wr.shader_test

When eliminating a copy, we were dropping the use_count of the mov that
is skipped, but not increasing the use_count of it's src instruction.

Fixes: 76440fcca9 freedreno/ir3: clean up dangling false-dep's
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-20 13:43:42 -05:00
Brian Paul e7d1a93723 svga: replaced 'unsigned' with proper enum types in shader code
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-02-20 08:11:06 -07:00
Andres Gomez 36ac485bd1 swr: bump minimum supported LLVM version to 4.0
Since radv and radeonsi removed support for LLVM 3.9 the distcheck
target got broken because SWR distribution needed 3.9.x.

After checking with George Kyriazis, SWR is OK with moving to LLVM 4.0
and above, which will solve this problem.

Fixes: 3bf1e036e8 ("amd: remove support for LLVM 3.9")
Cc: George Kyriazis <george.kyriazis@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2018-02-20 17:03:06 +02:00
Samuel Pitoiset 1ac741d690 ac/nir: move ac_declare_lds_as_pointer() outside of the switch
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-20 10:44:59 +01:00
Samuel Pitoiset b5d111ae76 radv: allow to force family using RADV_FORCE_FAMILY
Useful for pipeline-db.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-20 10:44:47 +01:00
Thomas Hellstrom f386776ea5 loader_dri3/glx/egl: Reinstate the loader_dri3_vtable get_dri_screen callback
Removing this callback caused rendering corruption in some multi-screen cases,
so it is reinstated but without the drawable argument which was never used
by implementations and was confusing since the drawable could have been
created with another screen.

Cc: "17.3 18.0" mesa-stable@lists.freedesktop.org
Fixes: 5198e48a0d (loader_dri3/glx/egl: Remove the loader_dri3_vtable get_dri_screen callback)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105013
Reported-by: Daniel van Vugt <daniel.van.vugt@canonical.com>
Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-20 10:36:53 +01:00
Thomas Hellstrom 80c31f7837 svga: Fix a leftover debug hack
Fix what appears to be a leftover debug hack.
The hack would force the driver to take a different blit path; possibly,
although unverified, reverting to software blits.

Tested using piglit tests/quick. No related regressions.

Cc: "17.2 17.3 18.0" <mesa-stable@lists.freedesktop.org>
Fixes: 9d81ab7376 (svga: Relax the format checks for copy_region_vgpu10 somewhat)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104625
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-02-20 10:12:19 +01:00
Iago Toral Quiroga af5f2322d0 anv/entrypoints: make vkGetDeviceProcAddr return NULL for instance commands
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-20 08:12:32 +01:00
Ilia Mirkin e1a70aed10 nv50,nvc0: mark ABGR format as displayable instead of ARGB format
This matches the hardware's capabilities.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-19 22:33:58 -05:00
Ilia Mirkin f7604d8af5 st/dri: only expose config formats that are display targets
In the case of NVIDIA hardware, ABGR is displayable but ARGB is not.
Only advertise the one set in the visuals list.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Stone <daniels@collabora.com>
2018-02-19 22:33:58 -05:00
Ilia Mirkin ebdc4c31e2 mesa: add xbgr support adjacent to xrgb
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Daniel Stone <daniels@collabora.com>
2018-02-19 22:33:58 -05:00
Timothy Arceri d88a2906f8 st/shader_cache: copy nir pointer to gl_program after deserializing
This fixes a crash when running the arb_get_program_binary-api-errors
piglit test twice.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-20 13:15:02 +11:00
Timothy Arceri 691c320de0 radeonsi: add nir shader cache support
In future we might want to try avoid calling nir_serialize() but
this works for now.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-20 13:15:02 +11:00
Timothy Arceri 2b431808ab radeonsi: rename variables tgsi_binary -> ir_binary
This better represents that the ir could be either tgsi or nir.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-20 13:15:02 +11:00
Marek Olšák f78fe98fff radeonsi: fix regression from 32-bit pointers on CI
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2018-02-19 17:56:23 +01:00
Samuel Pitoiset 549c7f3724 radv: compact varyings after removing unused ones
It makes no sense to compact before, and the description of
nir_compact_varyings() confirms that.

Polaris10:
Totals from affected shaders:
SGPRS: 108528 -> 108128 (-0.37 %)
VGPRS: 74548 -> 74500 (-0.06 %)
Spilled SGPRs: 844 -> 814 (-3.55 %)
Code Size: 3007328 -> 2992932 (-0.48 %) bytes
Max Waves: 16019 -> 16009 (-0.06 %)

Vega10:
Totals from affected shaders:
SGPRS: 106088 -> 106232 (0.14 %)
VGPRS: 74652 -> 74700 (0.06 %)
Spilled SGPRs: 692 -> 658 (-4.91 %)
Code Size: 2967708 -> 2953028 (-0.49 %) bytes
Max Waves: 18178 -> 18162 (-0.09 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-02-19 12:19:17 +01:00
Timothy Arceri 51e745cf77 radeonsi/nir: fix gl_FragCoord for pixel_center_integer
Fixes piglit test glsl-arb-fragment-coord-conventions

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-02-19 08:47:48 +11:00
Timothy Arceri 347038baa9 glsl/nir: add pixel_center_integer to shader info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-02-19 08:47:48 +11:00
Ilia Mirkin fe76fc11b1 gm107/ir: avoid using kepler instruction capabilities
Split up the op properties table into generation-specific bits, and only
use the kepler ones on kepler. Fixes some CTS images tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-02-17 23:41:21 -05:00
Ilia Mirkin f08fd676bf nvc0: add support for bindless on maxwell+
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-17 23:41:21 -05:00
Ilia Mirkin 0255550eb1 gm107/ir: change how SUQ works in preparation for bindless
All this information can be retrieved from the TIC directly. Avoid
having to dip into the constbuf information about the image.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-17 23:41:21 -05:00
Kenneth Graunke fa8a764b62 i965: Use absolute addressing for constant buffer 0 on Kernel 4.16+.
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic
state base address.  This makes it unusable for pushing UBOs.

There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake)
which controls whether buffer 0 is relative to dynamic state base
address, or simply a normal pointer.  Setting that gives us full
flexibility.  This lets us push up to 4 UBO ranges.

We can't currently write this on Haswell and earlier, and will need
to update the kernel command parser, and then do the whole version
checking song and dance.  We also need a brand new kernel that supports
context isolation - on older kernels, newly created contexts inherit
register state from whatever happened to be running.  So, setting this
would have catastrophic impact on other drivers such as libva, Beignet,
or older Mesa.

See commit 8ec5a4e4a4 where we did this
once before, but had to revert it in commit 013d331220.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-02-17 11:26:31 -08:00
Kenneth Graunke a63c74be85 i965: Stop restoring the default L3 configuration on Kernel 4.16+.
Kernel 4.16 has proper context isolation, which means we can change
the L3 configuration without worrying about that leaking to other
newly created contexts, breaking the assumptions of other userspace.

So, disable our workaround to reprogram it back to the default.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-02-17 11:26:18 -08:00
Mikko Perttunen 5a1606c51f nvc0: Use GP100_COMPUTE_CLASS on GP10B
GP10B requires the use of GP100_COMPUTE_CLASS instead of
GP104_COMPUTE_CLASS as is used for other non-GP100 chips.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-17 14:16:10 -05:00
Daniel Stone 9d21dbeb88 i965: Fix aux-surface size check
The previous commit reworked the checks intel_from_planar() to check the
right individual cases for regular/planar/aux buffers, and do size
checks in all cases.

Unfortunately, the aux size check was broken, and required the aux
surface to be allocated with the correct aux stride, but full image
height (!).

As the ISL aux surface is not recorded in the DRIimage, we cannot easily
access it to check. Instead, store the aux size from when we do have the
ISL surface to hand, and check against that later when we go to access
the aux surface.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: c2c4e5bae3 ("i965: Fix bugs in intel_from_planar")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-17 10:22:35 +00:00
Marek Olšák 931ec80eeb radeonsi: implement 32-bit pointers in user data SGPRs (v2)
User SGPRs changes:
    VS:     14 ->  9
    TCS:    14 -> 10
    TES:    10 ->  6
    GS:      8 ->  4
    GSCOPY:  2 ->  1
    PS:      9 ->  5
    Merged VS-TCS: 24 -> 16
    Merged VS-GS:  18 -> 11
    Merged TES-GS: 18 -> 11

SGPRS: 2170102 -> 2158430 (-0.54 %)
VGPRS: 1645656 -> 1641516 (-0.25 %)
Spilled SGPRs: 9078 -> 8810 (-2.95 %)
Spilled VGPRs: 130 -> 114 (-12.31 %)
Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread
Code Size: 52094872 -> 52692540 (1.15 %) bytes
Max Waves: 371848 -> 372723 (0.24 %)

v2: - the shader cache needs to take address32_hi into account
    - set amdgpu-32bit-address-high-bits

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
2018-02-17 04:52:17 +01:00
Marek Olšák 5722cd4084 radeonsi: disallow constant buffers with a 64-bit address in slot 0
State trackers must use a user buffer or const_uploader,
or set pipe_resource::flags same as const_uploader->flags.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17 04:52:17 +01:00
Marek Olšák d790b6cece radeonsi: move const_uploader allocations to 32-bit address space
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17 04:52:17 +01:00
Marek Olšák 50581549b7 winsys/radeon: implement and enable 32-bit VM allocations
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17 04:52:17 +01:00
Marek Olšák 1104d1e9d3 winsys/radeon: add struct radeon_vm_heap
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17 04:52:17 +01:00
Marek Olšák 48ecacfefa winsys/amdgpu: enable 32-bit VM allocations
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17 04:52:17 +01:00
Marek Olšák c2da45be86 gallium/radeon: add 32-bit address space heaps
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17 04:52:17 +01:00
Marek Olšák 0977b7f7b3 ac: query high bits of 32-bit address space 2018-02-17 04:51:58 +01:00
Marek Olšák 16be55da94 gallium: use PIPE_CAP_CONSTBUF0_FLAGS 2018-02-17 04:20:55 +01:00
Marek Olšák 8e7222f4e5 gallium: allow drivers to impose BO flags restrictions on constant buffer 0
Required by radeonsi for optimal behavior.
2018-02-17 04:20:55 +01:00
Alexander von Gluck IV 834d221512 meson: Add Haiku platform support v4
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-02-16 16:56:34 -06:00
Anuj Phogat 7b283544dc anv/icl: Add render target flush after uploading binding table
The PIPE_CONTROL command description says:

"Whenever a Binding Table Index (BTI) used by a Render Taget Message
points to a different RENDER_SURFACE_STATE, SW must issue a Render
Target Cache Flush by enabling this bit. When render target flush
is set due to new association of BTI, PS Scoreboard Stall bit must
be set in this packet."

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat 136f583a24 anv/icl: Enable float blend optimization
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat cd7102972f anv/icl: Use gen11 functions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat 9673c21d4f anv/icl: Build anv libs for gen11
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat 1f108b436b anv/icl: Generate gen11 entry point functions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat a86c0a08df anv/icl: Don't use DISPATCH_MODE_SIMD4X2
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat cd5fc634a8 anv/icl: Don't use SingleVertexDispatch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat 6e3940b3cf anv/icl: Don't set ResetGatewayTimer
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat 41a4c2c8e8 anv/icl: Add #define genX
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:31 -08:00
Anuj Phogat 413d475b44 anv/icl: Add gen11 mocs defines
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:31 -08:00
Kenneth Graunke 1d6cf433d2 i965: Implement GenerateMipmap directly, rather than using Meta.
Meta is awful and we'd like to stop using it.  Implementing this using
BLORP allows us to stop trashing a bunch of GL state every time.

This follows the structure of st_generate_mipmap().
compute_num_levels is lifted directly from there.

Improves performance in Gl41HdrBloom by about 11.794% +/- 1.01919% (n=3)
on Kabylake GT2 at 1280x720 (the difference seems much smaller at higher
resolutions).

v2 (idr): Don't try depth or depth-stencil blorp blits on Gen4 or Gen5
because it's not implemented yet.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-02-16 10:48:10 -08:00
Kenneth Graunke 9bcd31ea90 mesa: Move compute_num_levels from st_gen_mipmap.c to mipmap.c.
I want to use compute_num_levels inside i965.  Rather than duplicating
it, move it from mesa/st to core Mesa, and make it non-static.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-16 10:48:10 -08:00
Dylan Baker 03ab40b1f7 meson: freedreno depends on nir
This fixes a race condition in building targets that link in freedreno.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105120
Fixes: 0bbecc5a85 ("meson: define driver dependencies")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Mark Janes <mark.a.janes@intel.com>
2018-02-16 10:10:18 -08:00
George Kyriazis f1fbeb1a53 swr/rast: blend_epi32() should return Integer, not Float
fix gcc8 compiler error for KNL.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105029
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:02 -06:00
George Kyriazis 7dd793d10c swr/rast: Normalize path for debug metadata
in template gen_llvm.hpp

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:02 -06:00
George Kyriazis f979d0bc2f swr/rast: Consolidate archrast Draw events
Consolidate archrst draw events into single draw event with an attribute
that represents the type of draw

- Add handlers for new private proto versions of DrawInstancedEvent,
  DrawIndexedInstancedEvent, DrawInstancedSplitEvent, and
  DrawIndexedInstancedSplitEvent
- Convert the draw events to generic DrawInfoEvents
- parse_proto_event_fields() replaces 'AR_DRAW_TYPE' as a field type with
  'uint32_t'. This draw type is actually an enum, but can be represented
  as an unsigned integer.
- is_draw_or_dispatch() recognizes DrawInfoEvent as a draw event

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:02 -06:00
George Kyriazis 45df1a6520 swr/rast: Add semantics for translating address
Added support for another full translation path in fetch jitter.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:02 -06:00
George Kyriazis c09483cf0a swr/rast: Convert C Sampler intrinsics
Convert portions of the C sampler to the rasty SIMD lib.

Also fix SRL call with a non-immediate.  Don't count on the compiler
automagically converting an srli call to srl if the shift count isn't
an immediate.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:01 -06:00
George Kyriazis 37ebf86add swr/rast: Make SIMDLib templated types easier to use
"typename SIMD_T::TypeName" --> "TypeName<SIMD_T>"

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:01 -06:00
George Kyriazis 74e8bb4a22 swr/rast: Be more explicit when fetching next component
Use a new function to denote that we want to get offset to next component
and hide the fact that GEP is used underneath.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:01 -06:00
George Kyriazis da77eb55d5 swr/rast: Fix bug related to passing AR handle
We were passing a garbage handle. Let's not do that.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:01 -06:00
George Kyriazis 48d62409f8 swr/rast: Fix primitive replication issue in tesselation PA.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:01 -06:00
George Kyriazis e12db47a7d swr/rast: Use llvm intrinsic masked gather
Use llvm intrinsic masked.gather instead of manual unroll for the cases
where we have vector of pointers.  Improves llvm IR debug experience by
reducing a ton of IR to a single intrinsic call. Also seems to reduce
overall stack use considerably.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16 10:54:01 -06:00