Commit Graph

2206 Commits

Author SHA1 Message Date
Lionel Landwerlin de9649071a anv: use device->info instead of brw->is_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-30 17:59:17 +01:00
Juan A. Suarez Romero a2234614b6 anv: set right datatypes in anv_pipeline_binding
This structure contains two fields, binding and index, that store the
binding in the descriptor set and the index inside the binding.

These structures are defined as uint8_t, but the types in Vulkan
specification are uint32_t, so big values are clamp.

This fixes dEQP-VK.binding_model.shader_access.*.multiple_arbitrary_descriptors.*

v2: use UINT32_MAX for index when having no render targets (Tapani)

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-08-30 08:01:53 +02:00
Matt Turner 37f664a066 blorp: Explicitly cast between different enums
Fixes warnings like

warning: implicit conversion from enumeration type 'enum isl_format' to
different enumeration type 'enum GEN10_SURFACE_FORMAT'
[-Wenum-conversion]
         .SourceElementFormat = ISL_FORMAT_R32_UINT,
                                ^~~~~~~~~~~~~~~~~~~

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner b962922fb7 intel/isl: Mark functions used conditionally as UNUSED
The functions we're marking as UNUSED in isl_surface_state.c are used
only when compiling for particular generations.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner c4ce12728e intel/isl: Explicitly cast between different enums
Fixes warnings like

warning: implicit conversion from enumeration type 'enum isl_format' to
different enumeration type 'enum GEN10_SURFACE_FORMAT'
[-Wenum-conversion]
         .SourceElementFormat = ISL_FORMAT_R32_UINT,
                                ^~~~~~~~~~~~~~~~~~~

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner 9fdbc273ef intel/isl: Remove 'inline' keywords
Unless you have data, the compiler knows better than you whether a
function should be inlined.

Unlike all other cases in this series, the removal of the inline keyword
from isl_format_has_channel_type actually changes the resulting binary
with gcc-6.3.0:

   text	   data	    bss	    dec	    hex	filename
7831116	 346384	 420648	8598148	 833284	i965_dri.so before
7830716	 346384	 420648	8597748	 8330f4	i965_dri.so after

I think this is likely an improvement. No difference in the resulting
binary with clang-4.0.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner cdbaa8a12f anv: Mark functions used conditionally as UNUSED
The functions we're marking as UNUSED in genX_pipeline.c are used only
when compiling for particular generations.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner 5d4afef459 anv: Explicitly cast between different enums
Fixes warnings like

warning: implicit conversion from enumeration type 'enum isl_format' to
different enumeration type 'enum GEN10_SURFACE_FORMAT'
[-Wenum-conversion]
         .SourceElementFormat = ISL_FORMAT_R32_UINT,
                                ^~~~~~~~~~~~~~~~~~~

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner 6cfc49287d anv: Remove 'inline' keywords
Unless you have data, the compiler knows better than you whether a
function should be inlined.

No difference in the resulting binary with gcc-6.3.0 or clang-4.0.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner 012887ef48 anv: Use GNU C empty brace initializer
Avoids Clang's warning about the current code:

   warning: suggest braces around initialization of subobject

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner dff75c7175 i965: Drop unnecessary conditional
Clang doesn't realize that 0 and 1 are the only possibilities, a thinks
lots of variables might be uninitialized.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Matt Turner c5d2e2d43f configure: Test for -Wno-initializer-overrides
Clang has "-Wno-initializer-overrides", while gcc has
"-Wno-override-init". Quiets a lot of warnings with clang.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-29 15:20:57 -07:00
Jason Ekstrand 43e8808b82 anv: Add support for the SYNC_FD handle type for fences
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 19:33:43 -07:00
Jason Ekstrand 49c59c88eb anv: Implement VK_KHR_external_fence
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 19:33:43 -07:00
Jason Ekstrand 5f372d93a9 anv: Use DRM sync objects to back fences whenever possible
In order to implement VK_KHR_external_fence, we need to back our fences
with something that's shareable.  Since the kernel wait interface for
sync objects already supports waiting for multiple fences in one go, it
makes anv_WaitForFences much simpler if we only have one type of fence.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 19:33:43 -07:00
Jason Ekstrand d21c151091 anv/gem: Add support for syncobj wait and reset
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 19:33:43 -07:00
Jason Ekstrand 144487ebb8 anv/gem: Add a flags parameter to syncobj_create
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 19:33:43 -07:00
Jason Ekstrand caa71343c6 anv: Rename anv_fence_state to anv_bo_fence_state
It only applies to legacy BO fences.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 18:35:30 -07:00
Jason Ekstrand 92286dc08a anv: Pull the guts of anv_fence into anv_fence_impl
This is just a refactor, similar to what we did for semaphores, in
preparation for handling VK_KHR_external_fence.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 18:35:27 -07:00
Jason Ekstrand 738e5e3c1d anv/wsi: Use QueueSubmit to trigger the fence in AcquireNextImage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 18:35:25 -07:00
Jason Ekstrand f992bb205c anv: Rework fences to work more like BO semaphores
This commit changes fences to work a bit more like BO semaphores.
Instead of the fence being a batch, it's simply a BO that gets added
to the validation list for the last execbuf call in the QueueSubmit
operation.  It's a bit annoying finding the last submit in the execbuf
but this allows us to avoid the dummy execbuf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 18:35:22 -07:00
Jason Ekstrand 2eacfdeec9 anv/queue: Allow temporary import of SYNC_FD semaphores
We didn't allow them before because it didn't look like the spec allowed
it.  It certainly doesn't make much sense.  However, there are CTS tests
that apparently hit this.  What the spec actually says is:

    "Importing a payload using handle types with copy transference
    creates a duplicate copy of the payload at the time of import, but
    makes no further reference to it. Fence signaling, waiting, and
    resetting operations performed on the target of copy imports must
    not affect any other fence or payload."

A SYNC_FD has copy transference but the import may be temporary or
permanent.  If you do a permanent import of something with copy
transference, I guess it's supposed to work and end up resetting the
permanent state.  In any case, there seems to be no real harm in
allowing it, so why not.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-28 18:34:06 -07:00
Topi Pohjolainen 5dd072380a intel/compiler: Cast reg types explicitly
Makes coverity happier.

CID: 1416799
Fixes: c1ac1a3d25 (i965: Add a brw_hw_type_to_reg_type() function)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-08-28 14:43:39 +03:00
Rafael Antognolli 1eb58960bf i965: Do not store SRC after 0 on component control.
The PRM SKL-Vol 2b-05.16 says:

   "Within a VERTEX_ELEMENT_STATE structure, if a Component Control
   field is set to something other than VFCOMP_STORE_SRC, no
   higher-numbered Component Control fields may be set to
   VFCOMP_STORE_SRC. In other words, only trailing components can be set
   to something other than VFCOMP_STORE_SRC."

Since we set the component 1 to VFCOMP_STORE_0 on gen8+, and
VFCOMP_STORE_IID on gen5+, and we are not using components 2 and 3,
let's also set them to VFCOMP_STORE_0.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-08-25 10:02:09 -07:00
Jason Ekstrand 95f533d922 anv,i965: Move CS shared lowering into anv
Right now, OpenGL uses the GLSL lowering for shared variables and anv
uses NIR to lower them.  For a long time, we've done this weird thing
where we do the NIR lowering unconditionally and then add the SLM sizes
from the two together.  This works because one of them will always be 0
but it's a bit sketchy.  Let's just move the NIR-based lowering into
anv_pipeline and get rid of the sketch.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-08-24 16:34:29 -07:00
Kenneth Graunke 4ffa9f3635 i965: Stop using wm_prog_data->binding_table.render_target_start.
Render target surfaces always start at binding table index 0.
This is required for us to use headerless FB writes, which we
really want to do.  So, we'll never change that.

Given that, it's not necessary to look up a wm_prog_data field
which we already know contains 0.  We can drop the dependency in
brw_renderbuffer_surfaces (Gen4-5)...which was already confusingly
missing from gen6_renderbuffer_surfaces.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-08-23 11:55:17 -07:00
Kenneth Graunke 274afad4cd i965: Add a brw_wm_prog_data::has_render_target_reads field.
State upload code should use prog_data rather than poking at shader_info
directly.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-08-23 11:55:17 -07:00
Francisco Jerez e29ccaac29 anv: Check that in_fence fd is valid before closing it.
Probably harmless, but will overwrite errno with a failure status
code.  Reported by coverity.

CID 1416600: Argument cannot be negative (NEGATIVE_RETURNS)
Fixes: 5c4e4932e0 (anv: Implement support for exporting semaphores as FENCE_FD)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-22 11:56:38 -07:00
Francisco Jerez 7ca124a6a3 anv: Add error handling to setup_empty_execbuf().
The anv_execbuf_add_bo() call can actually fail in practice, which
should cause the QueueSubmit operation to fail.  Reported by Coverity.

CID: 1416606: Unchecked return value (CHECKED_RETURN)
Fixes: 017cdb10cf (anv: Submit a dummy batch when only semaphores are provided.)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-22 11:54:16 -07:00
Matt Turner d37d9f84ac i965: Mark functions static
Cuts 300 bytes of .text

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-08-21 14:45:44 -07:00
Matt Turner f30902629c i965/vec4: Use 'class' src_reg, rather than 'struct' src_reg
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-08-21 14:45:44 -07:00
Matt Turner a77d5b28ac i965/vec4: Return float from spill_cost_for_type()
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-08-21 14:45:44 -07:00
Matt Turner 76f36607b0 anv: Move clamp_int64() inside the IVB check
It's only used in the gen7_cmd_buffer_emit_scissor() function.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-08-21 14:45:44 -07:00
Matt Turner a98b1a8922 i965: Optimize reading the destination type
brw_hw_type_to_reg_type() needs to know only whether the file is
BRW_IMMEDIATE_VALUE or not, which is not a valid file for the
destination. gcc and clang will evaluate __builtin_strcmp() at compile
time, so we can use it to pass a constant file for the destination.

   text	   data	    bss	    dec	    hex	filename
7816214	 346248	 420496	8582958	 82f72e	i965_dri.so before
7816070	 346248	 420496	8582814	 82f69e	i965_dri.so after

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 91ef949054 i965: Mark brw_hw_type_to_reg_type() as a pure function
text	   data	    bss	    dec	    hex	filename
7816886	 346248	 420496	8583630	 82f9ce	i965_dri.so before
7816214	 346248	 420496	8582958	 82f72e	i965_dri.so after

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner e07fe89035 i965: Hide the register type hardware encodings
So we stop mixing them with the logical enum.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 4fab67a441 i965: Stop using hardware register types directly
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner c746f1c888 i965: Add brw_hw_reg_type_to_letters() and use it in brw_disasm.c
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 6a2471b501 i965: Move brw_reg_type_letters() as well
And add "to_" to the name for consistency with the other functions in
this file.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 1cb0a7941b i965: Switch to using the logical register types
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner cb2cd462b1 i965: Add functions to abstract access to register types
Previously the brw_inst{,_set}_{dst,src0,src1}_reg_type() functions
provided access to the hardware encodings for the register types. We
often mixed these with the logical BRW_REGISTER_TYPE_* enums (which
themselves used to be the hardware format!) with bad results.

With that functionality now available with the hw_ versions (see
previous commit), we now add functions that take the logical
BRW_REGISTER_TYPE_* enums and convert into the hardware format and vice
versa. To do the conversion we also have to provide the file.

Note the asymmetry between the two functions: the new getter reads the
file from the instruction word, and to ensure that is always set the
setter writes both the file and the type.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 9fb8323328 i965: Rename brw_inst's functions that access the register type
Put hw_ in the name so that it's clear these are the hardware encodings.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 3e379af492 i965: Index brw_hw_reg_type_to_size()'s table by logical type
I'll be transitioning everything to use the logical types.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner c1ac1a3d25 i965: Add a brw_hw_type_to_reg_type() function
Will be used in later commits.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner dbe7dd13dd i965: Use a common table to translate logical to hardware types
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner bfcc9aa829 i965: Extract functions dealing with register types to separate file
I'm going to encapsulate all of the logic dealing with register types in
this file.

Rename the parameters for the hardware encodings from type -> hw_type at
the same time.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 890f863da0 i965: Reverse file/type arguments to register type functions
I think of the initial arguments as "state" and the last as the actual
subject.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 92f787ff86 i965: Add support for disassembling 64-bit integer immediates
After the last patch converted things into enums, I helpfully got a
compiler warning about these missing from the switch statement.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner deae25ce37 i965: Use separate enums for register vs immediate types
The hardware encodings often mean different things depending on whether
the source is an immediate.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 8815b9677f i965: Reorder brw_reg_type enum values
These vaguely corresponded to the hardware encodings, but that is purely
historical at this point. Reorder them so we stop making things "almost
work" when mixing enums.

The ordering has been closen so that no enum value is the same as a
compatible hardware encoding.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner ce6b8627d8 i965: Validate destination restrictions with vector immediates
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 1d79c828d8 i965: Don't let raw-move check be tricked by immediate vector types
UB and B type encodings are the same as UV and VF. Noticed when writing
the following patch.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 48aa6ecb87 i965: Only change type of 0.0f to VF if destination stride == 1
The destination stride must be equivalent to a dword if VF is used.

Also, since the only compaction table entires with "i:vf" have the
destination as "r:f" specifically check that the destination is of type
float.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 56a676eed2 i965: Remove CONT/BREAK from instruction compaction test
These cannot be compacted. A similar mistake was fixed in commit
90eaf01616

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 3d661e6062 i965: Test instruction compaction on all supported Gens
Note that there's no point in testing on G45, since its compaction is
the same as Gen5. Same logic applies to Gen7 variants and low-power
parts.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 9ff7d9b853 i965: Silence signed/unsigned comparison warning
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner eac89911e5 i965: Move compaction "prepass" into brw_eu_compact.c
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner 17641f6388 i965: Mark src inst pointer const in compaction code
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Topi Pohjolainen 393ec1a507 intel/blorp: Adjust intra-tile x when faking rgb with red-only
v2 (Jason): Adjust directly in surf_fake_rgb_with_red()

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101910

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-08-21 09:55:08 +03:00
Kenneth Graunke 6f8a577ed2 anv: Use ISL for emitting null surface states.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-08-19 00:46:48 -07:00
Kenneth Graunke 5db9757bd7 isl: Add a null surface fill function.
ISL already offers functions to fill out most kinds of SURFACE_STATE,
so why not handle null surfaces too?

Null surfaces are simple, so we can just take the dimensions, rather
than an entirte fill structure.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-08-19 00:46:36 -07:00
Eric Anholt 9caba0f16f anv: Move a comment that got left behind in the u_vector refactor. 2017-08-18 11:56:58 -07:00
Jason Ekstrand 1af8342b0c intel/isl: Replace switch statements of doom with a macro
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-08-17 18:09:05 -07:00
Jason Ekstrand 2d68d27071 intel/isl: Reduce header file duplication
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-08-17 18:09:05 -07:00
Jason Ekstrand bf1d2e84f3 anv/gem: Add a stub for sync_file_merge
This fixes make check

Fixes: 5c4e4932e0
2017-08-16 18:44:26 -07:00
Jason Ekstrand 98983503cb anv: Advertise VK_KHR_external_semaphore
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-15 19:08:26 -07:00
Jason Ekstrand 55bce22d8d anv: Use DRM sync objects for external semaphores when available
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-15 19:08:26 -07:00
Jason Ekstrand f41a0e4b0d anv/gem: Add a drm syncobj support
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-15 19:08:26 -07:00
Jason Ekstrand 5c4e4932e0 anv: Implement support for exporting semaphores as FENCE_FD
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-15 19:08:26 -07:00
Jason Ekstrand e4054ab77b anv/gem: Use EXECBUFFER2_WR when the FENCE_OUT flag is set
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-15 19:08:26 -07:00
Jason Ekstrand 017cdb10cf anv: Submit a dummy batch when only semaphores are provided.
Vulkan allows you to do a submit whose only job is to wait on and
trigger semaphores.  The easiest way for us to support that right
now is to insert a dummy execbuf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-15 19:08:26 -07:00
Jason Ekstrand 031f57eba3 anv: Add a basic implementation of VK_KHX_external_semaphore
This patch adds an implementation based on DRM BOs.  We don't actually
advertise the extension yet because we want to add a couple more paths
first.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-15 19:08:26 -07:00
Scott D Phillips d6539608a4 intel/genxml: Fix gen10 BLEND_STATE variable length packing
BLEND_STATE packing was modified to be variable-length in:

 9670124e31 genxml: Make BLEND_STATE command support variable length array.

The initial gen10.xml still had the old, fixed-length style
definition for BLEND_STATE. So gen10_upload_blend_state would
overwrite the packed BLEND_STATE_ENTRYs with its own fixed array
of all-zero entries when packing BLEND_STATE. This caused
BLEND_STATE upload to not work at all.

Fixes: aa416f515a ("i965/genxml: Add gen10.xml")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-08-15 09:06:29 -07:00
Ben Widawsky 8f6e54c929 i965: Pretend that CCS modified images are two planes
v2: move is_aux into if block. (Jason)
Use else block instead of goto (Jason)

v3: Fix up logic for is_aux (Ben)
Fix up size calculations and add FIXME (Ben)

v4 (Jason Ekstrand):
Use the aux_pitch in the image instead of calculating it

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-08-14 10:43:30 -07:00
Jason Ekstrand cf2e92262b intel/isl: Add support for I915_FORMAT_MOD_Y_TILED_CCS
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-08-14 10:43:30 -07:00
Iago Toral Quiroga 81615ad444 intel/compiler: properly size attribute wa_flags array for Vulkan
Mesa will map user defined vertex input attributes to slots
starting at VERT_ATTRIB_GENERIC0 which gives us room for only 16
slots (up to GL_VERT_ATTRIB_MAX). This sufficient for GL, where
we expose exactly 16 vertex attributes for user defined inputs, but
in Vulkan we can expose up to 28 (which are also mapped from
VERT_ATTRIB_GENERIC0 onwards) so we need to account for this when
we scope the size of the array of attribute workaround flags
that is used during the brw_vertex_workarounds NIR pass. This
prevents out-of-bounds accesses in that array for NIR shaders
that use more than 16 vertex input attributes.

Fixes:
dEQP-VK.pipeline.vertex_input.max_attributes.*

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-11 10:41:44 +02:00
Kenneth Graunke 5563872dbf isl: Validate row pitch of stencil surfaces.
Also, silence an obnoxious finishme that started occurring for all
GL applications which use stencil after the i965 ISL conversion.

v2: Check against 3DSTATE_STENCIL_BUFFER's pitch bits when using
    separate stencil, and 3DSTATE_DEPTH_BUFFER's bits when using
    combined depth-stencil.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-08-10 15:18:58 -07:00
Jason Ekstrand 4d27c6095e intel/isl: Don't align the height of the last array slice
We were calculating the total height of 2D surfaces by multiplying the
row pitch by the number of slices.  This means that we actually request
slightly more space than actually needed since the padding on the last
slice is unnecessary.  For tiled surfaces this is not likely to make a
difference.  For linear surfaces, on the other hand, this means we may
require additional memory.  In particular, this makes the i965 driver
reject EGL imports of buffers which do not have this extra padding.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
2017-08-07 09:31:11 -07:00
Jason Ekstrand c15b92ce11 intel/isl: Stop padding surfaces
The docs contain a bunch of commentary about the need to pad various
surfaces out to multiples of something or other.  However, all of those
requirements are about avoiding GTT errors due to missing pages when the
data port or sampler accesses slightly out-of-bounds.  However, because
the kernel already fills all the empty space in our GTT with the scratch
page, we never have to worry about faulting due to OOB reads.  There are
two caveats to this:

 1) There is some potential for issues with caches here if extra data
    ends up in a cache we don't expect due to OOB reads.  However,
    because we always trash the entire cache whenever we need to move
    anything between cache domains, this shouldn't be an issue.

 2) There is a potential issue if a surface gets placed at the very top
    of the GTT by the kernel.  In this case, the hardware could
    potentially end up trying to read past the top of the GTT.  If it
    nicely wraps around at the 48-bit (or 32-bit) boundary, then this
    shouldn't be an issue thanks to the scratch page.  If it doesn't,
    then we need to come up with something to handle it.

Up until some of the GL move to ISL, having the padding code in there
just caused us to harmlessly use a bit more memory in Vulkan.  However,
now that we're using ISL sizes to validate external dma-buf images,
these padding requirements are causing us to reject otherwise valid
images due to the size of the BO being too small.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
2017-08-07 09:31:11 -07:00
Jason Ekstrand 06d3115bb9 anv/formats: Allow sampling on depth-only formats on gen7
We can't sample from depth-stencil formats but on gen7 but we can sample
from depth-only formats.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102024
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-08-07 08:27:09 -07:00
Chris Wilson 6c530ad116 i965: Reduce passing 2x32b of reloc_domains to 2 bits
The kernel only cares about whether the object is to be written to or
not, only reduces (reloc.read_domains, reloc.write_domain) down to just
!!reloc.write_domain. When we use NO_RELOC, the kernel doesn't even read
those relocs and instead userspace has to pass that information in the
execobject.flags. We can simplify our reloc api by also removing the
unused read/write domains and only pass the resultant flags.

The caveat to the above are when we need to make the kernel aware that
certain objects need to take into account different work arounds.
Previously, this was done using the magic (INSTRUCTION, INSTRUCTION)
reloc domains. NO_RELOC requires this to be passed in the execobject
flags as well, and now we push that up the callstack.

The API is more compact, more expressive of what happens underneath, but
unfortunately requires more knowledge of the system at the point of use.
Conversely it also means that knowledge is specific and not generally
applied and so not overused.

   text	   data	    bss	    dec	    hex	filename
8502991	 356912	 424944	9284847	 8dacef	lib/i965_dri.so (before)
8500455	 356912	 424944	9282311	 8da307	lib/i965_dri.so (after)

v2: (by Ken) Rebase.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-08-04 10:26:37 -07:00
Juan A. Suarez Romero 86c68e0a33 anv: Makefile.vulkan.am: ICD json files are now generated with python
Commit 0ab04ba979 (anv: Use python to generate ICD json files) changed
the way ICD json files are created.

Remove the old .in files from extra dist, and add the python script.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-04 09:54:46 +02:00
Lionel Landwerlin 1006cd512d anv: put anv_extensions.c in gitignore
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-08-03 16:14:45 +01:00
Tapani Pälli ca6237eb4f android: anv_extensions.c is generated to libmesa_vulkan_common
Fixes build error with anv_extensions.c not found for
libmesa_anv_entrypoints.

Fixes: d62063c "anv: Autogenerate extension query and lookup"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-03 13:09:59 +03:00
Dave Airlie 271fa3a684 intel/vec4/gs: reset nr_pull_param if DUAL_INSTANCED compile failed.
If dual object compile fails (as seems to happen with virgl a
fair bit, and does piglit even have any tests for it?), we end up
not restarting the pull params, so we call
vec4_visitor::move_uniform_array_access_to_pull_constant
a second time and it runs over the ends of the alloc.

Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test
running inside virgl on ivybridge.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-03 16:54:08 +10:00
Matt Turner 858f554078 i965: Fix indentation 2017-08-02 16:49:32 -07:00
Jason Ekstrand 277644221d anv: Advertise VK_KHR_relaxed_block_layout
There is literally no work for us to do here.  It already just works in
our driver.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-02 09:13:13 -07:00
Jason Ekstrand 600605e3fc anv: Bump the advertised version to 1.0.57
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-02 09:13:13 -07:00
Jason Ekstrand 077b200096 anv: Pull the API version from anv_extensions.py
This way everything stays in sync and we only have the one version
number.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-02 09:13:13 -07:00
Jason Ekstrand 0ab04ba979 anv: Use python to generate ICD json files
This is more lines of code but the python is far easier to read than the
sed expressions we were using before.  Also, this allows us to pull the
API version from anv_entrypoints.py so it never gets out-of-sync.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-02 09:13:13 -07:00
Jason Ekstrand 7382d8a416 anv: Add MAX_API_VERSION to anv_extensions.py
The VkVersion class is probably overkill but it makes it really easy to
compare versions in a way that's safe without the caller having to think
about patch vs. no patch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-02 09:13:13 -07:00
Jason Ekstrand a25267654b anv: Make some bits of anv_extensions module-private
This way we can use "from anv_extensions import *" in the entrypoint
generator without worrying too much about pollution

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-02 09:13:13 -07:00
Tapani Pälli 99c764b647 intel: move gen_decoder.* back to COMMON_FILES
this change reverts commit 4f695731, we want to be able to build
with -DDEBUG and gen_decoder on Android.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-02 10:31:13 +03:00
Tapani Pälli 9bd15da85d android: link libmesa_intel_common with zlib and expat
Makes it possible to build Mesa on Android with -DDEBUG with
the next patch that reverts 4f695731.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-02 10:30:50 +03:00
Jason Ekstrand d62063ce31 anv: Autogenerate extension query and lookup
As time goes on, extension advertising is going to get more complex.
Today, we either implement an extension or we don't.  However, in the
future, whether or not we advertise an extension will depend on kernel
or hardware features.  This commit introduces a python codegen framework
that generates the anv_EnumerateFooExtensionProperties functions as well
as a pair of anv_foo_extension_supported functions for querying for the
support of a given extension string.  Each extension has an "enable"
predicate that is any valid C expression.  For device extensions, the
physical device is available as "device" so the expression could be
something such as "device->has_kernel_feature".  For instance
extensions, the only option is VK_USE_PLATFORM defines.

This mechanism also means that we have a single one-line-per-entry table
for all extension declarations instead of the two tables we had in
anv_device.c and the one we had in anv_entrypoints_gen.py.  The Python
code is smart and uses the XML to determine whether an extension is an
instance extension or device extension.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-01 11:12:41 -07:00
Jason Ekstrand ddc86c1d0e anv: Add a new centralized extensions file
This will allow us to keep everything in one place when it comes to
declaring what extensions are supported.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-01 11:12:41 -07:00
Iago Toral Quiroga 31f1863ace anv: only expose up to 28 vertex attributes
The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs.
However, the maximum allowed value of "Vertex URB Entry Read Length"
in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements.
Because we also need to reserve a vertex buffer to upload
VertexIndex/InstanceIndex and another to upload DrawID when needed,
we can only expose 28.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-26 08:16:43 +02:00
Iago Toral Quiroga a848e693ef anv/cmd_buffer: fix off by one error in assertion
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-26 08:02:06 +02:00
Eric Anholt decd2b32aa intel/decoder: Reuse the gen_make_gen() helper.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-25 14:44:52 -07:00
Eric Anholt 19ffa4bfb2 intel/decoder: Reuse the MAX2 macro instead of defining another one.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-25 14:44:52 -07:00
Emil Velikov 5d47dd9c2a intel/blorp: ship blorp_genX_exec.h within the tarball
Fixes: c9cb37b2a6 ("intel/blorp: Add a partial resolve pass for MCS")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 15:14:21 +01:00
Jason Ekstrand 6874b953f6 anv/image: zalloc image views
This allows us to avoid some extra zeroing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand a1cad8218e anv/image: Use vk_zalloc instead of an explicit memset
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand 1e32c8303a anv: Separate surface states by layout instead of aux_usage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand 628bfaf1c6 intel/isl: Add some sanity checks for compressed surfaces
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand 5de4209f91 intel/isl: Add a helper to get a subimage surface
We already have a helper for doing this in BLORP, this just moves the
logic into ISL where we can share it with other components.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand 72bc38cfc5 anv: Get rid of some unused function declarations
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand d4de403f91 intel/isl: Add a helper for determining if a color is 0/1
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand b26b2490e5 intel/blorp: Allow blorp_copy on sRGB formats
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand fb86ac94cb intel/isl/format: Add an srgb_to_linear helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand 44e9d65757 intel/isl/format: Dedent the template in gen_format_layout.py
This makes it much easier to edit the template and doesn't really dirty
the python all that much.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand 268ba028dc intel/isl: Add an aux state for "partial clear"
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand c9cb37b2a6 intel/blorp: Add a partial resolve pass for MCS
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Nanley Chery 67027ddf3f anv: Predicate fast-clear resolves
Image layouts only let us know that an image *may* be fast-cleared. For
this reason we can end up with redundant resolves. Testing has shown
that such resolves can measurably hurt performance and that predicating
them can avoid the penalty.

v2:
- Introduce additional resolve state management function (Jason Ekstrand).
- Enable easy retrieval of fast clear state fields.
v3: Use more descriptive field enums (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery 8e2729fbb8 intel/blorp: Allow BLORP calls to be predicated
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery be516ba9b1 anv/cmd_buffer: Skip some input attachment transitions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery 597ff919e7 anv: Stop resolving CCS implicitly
With an earlier patch from this series, resolves are additionally
performed on layout transitions. Remove the now unnecessary implicit
resolves within render passes.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery 5ba93e6f5a anv: Transition more color buffer layouts
v2: Expound on comment for the pipe controls (Jason Ekstrand).
v3:
- Cast base_layer to uint64_t to avoid overflow.
- Remove "seems" from the pipe control comment.
- Fix clamp of layer_count (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery a899747eb3 anv/cmd_buffer: Warn about not enabling CCS_E
Use the performance warning infrastructure to provide helpful
information when testing applications.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery 9c9f63d1c7 anv/cmd_buffer: Move aux_usage assignment up
For readability, bring the assignment of CCS closer to the assignment of
NONE and MCS.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery 62d72bb5d0 anv/cmd_buffer: Always enable CCS_D in render passes
The lifespan of the fast-clear data will surpass the render pass scope.
We need CCS_D to be enabled in order to invalidate blocks previously
marked as cleared and to sample cleared data correctly.

v2: Avoid refactoring.
v3: Allow CCS_D for subpass resolves.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery 8e532aa028 anv/cmd_buffer: Disable CCS on gen7 color attachments upfront
The next patch enables the use of CCS_D even when the color attachment
will not be fast-cleared. Catch the gen7 case early to simplify the
changes required.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery 9fd1f2aa3c anv/cmd_buffer: Ensure fast-clear values are current
v2: Rewrite functions, change location of synchronization.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery 0b16600056 anv/gpu_memcpy: Add a lighter-weight GPU memcpy function
We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.

v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery dcff5ab9f1 anv/cmd_buffer: Restrict fast clears in the GENERAL layout
v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
v3: Allow some fast clears in the GENERAL layout.
v4: Remove extra '||' and adjust line break (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery 9ffe87122b anv/cmd_buffer: Don't partially fast clear image layers
v2: Don't pass in the command buffer (Jason Ekstrand).
v3: Remove an incorrect assertion and an if condition for gen7.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery 07cc2ec9db anv/cmd_buffer: Initialize the clear values buffer
v2: Rewrite functions.
v3 (Jason Ekstrand):
- Don't set ResourceMinLOD.
- Fix clamp of level_count.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery 88200e87f6 anv/image: Append CCS/MCS with a fast-clear state buffer
v2: Update comments, function signatures, and add assertions.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery 325ecffc62 anv/image: Disable CCS if the image doesn't support rendering
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery 01db9a74c6 intel/isl: Add surface state clear value information
This will be used to load and store clear values from surface state
objects.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery b178e239dd anv: Transition MCS buffers from the undefined layout
v2: Define MCS buffers with any sample count (Jason)

Cc: <mesa-stable@lists.freedesktop.org>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-07-22 20:12:09 -07:00
Jason Ekstrand f793c57cc5 intel/isl: Tighten up restrictions for CCS on gen7
It may technically be possible to enable some sort of fast-clear support
for at least the base slice of a 2D array texture on gen7.  However,
it's not documented to work, we've never tried to do it in GL, and we
have no idea what the hardware does if you turn on CCS_D with arrayed
rendering.  Let's just play it safe and disallow it for now.  If someone
really cares that much about gen7 performance, they can come along and
try to get it working later.
2017-07-22 20:12:07 -07:00
Jason Ekstrand 20533e0da7 anv/blorp: Assert isl_surf_init success in do_buffer_copy
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 08:21:27 -07:00
Jason Ekstrand cf39fb06e3 anv/blorp: Explicitly set row_pitch in do_buffer_copy
We have a very specific row pitch that we want and we don't want ISL to
be changing it on us so just be explicit about it.

Fixes: a40f043034
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 08:20:07 -07:00
Kenneth Graunke 30d6bc470a i965: Set lower_vote_trivial in vector_nir_options_gen6 too.
There's a second struct for Gen6+.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-21 18:09:01 -07:00
Topi Pohjolainen fbfc6a2f67 intel/isl/gen7: Don't allow multisampled surfaces with valign2
There is the same constraintg later on as assert in
isl_gen7_choose_image_alignment_el() so catch it earlier in order
to return error instead of crash.

Needed to avoid crashes with piglits on IVB and HSW:

arb_internalformat_query2.image_format_compatibility_type pname checks
arb_internalformat_query2.all internalformat_<x>_type pname checks
arb_internalformat_query2.max dimensions related pname checks
arb_copy_image.arb_copy_image-formats --samples=2/4/6/8
arb_texture_float.multisample-fast-clear gl_arb_texture_float

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen df9bb8dc05 intel/isl/gen7: Allow msaa with signed integer formats
These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.

There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try 16I/32I in addition to GL_RGBA8I.
IvyBridge passed all tests with all sample numbers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen abb84e3f2d intel/isl/gen7: Allow msaa with 128-bit formats
These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.

There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try GL_RGBA16F/32F/16I/16UI/32I/32UI in addition to GL_RGBA/8I.
IvyBridge passed all tests with all sample numbers and even
with 128-bit formats.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen 514d68576d intel/isl: Allow 1D surfaces with compressed formats
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen a40f043034 intel/isl: Align non-tiled horizontally by cache line
in order to support blit engine.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Matt Turner 069bf7c907 i965/fs: Match destination type to size for ballot
No use in taking a 64-bit value when we know the high 32-bits are zero.
2017-07-20 16:56:50 -07:00
Matt Turner 1038d385a9 nir: Reduce destination size of ballot intrinsic when possible
Some hardware, like i965, doesn't support group sizes greater than 32.
In that case, we can reduce the destination size of the ballot
intrinsic, which will simplify our code generation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner 782ef30451 i965/fs: Implement ARB_shader_ballot operations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner 8238930510 i965/fs: Do not move MOVs writing the flag outside of control flow
The implementation of ballotARB() will start by zeroing the flags
register. So, a doing something like

        if (gl_SubGroupInvocationARB % 2u == 0u) {
                ... = ballotARB(true);
		[...]
        } else {
                ... = ballotARB(true);
		[...]
	}

(like fs-ballot-if-else.shader_test does) would generate identical MOVs
to the same destination (the flag register!), and we definitely do not
want to pull that out of the control flow.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Francisco Jerez f1b7c47913 i965/fs: Handle explicit flag sources in flags_read()
The implementations of the ARB_shader_ballot intrinsics will explicitly
read the flag as a source register.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-20 16:56:49 -07:00
Matt Turner 43ef75b394 nir: Add system values from ARB_shader_ballot
We already had a channel_num system value, which I'm renaming to
subgroup_invocation to match the rest of the new system values.

Note that while ballotARB(true) will return zeros in the high 32-bits on
systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
variables do not consider whether channels are enabled. See issue (1) of
ARB_shader_ballot.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner ee9fa4ac18 i965/fs: Implement ARB_shader_group_vote operations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Francisco Jerez 93dc736f4e i965/fs: Handle explicit flag destinations in flags_written()
The implementations of the ARB_shader_group_vote intrinsics will
explicitly write the flag as the destination register.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-20 16:56:49 -07:00
Matt Turner 30b72f4126 i965/vec4: Lower ARB_shader_group_vote intrinsics
I don't expect anyone is going to care about using this in vec4 programs
(vertex/tessellation/geometry on Gen6/7), no one has come up with a good
way to implement it much less test it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner d4c9d6a3b2 nir: Add pass to optimize intrinsics
Specifically, constant fold intrinsics from ARB_shader_group_vote, but I
suspect it'll be useful for other things in the future.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00