Commit Graph

2206 Commits

Author SHA1 Message Date
Anuj Phogat 8521559e08 i965: Fix broxton 2x6 l3 config
The new table added in this patch matches with the table
in gfxspecs. We were programming the wrong values earlier.

V2: Update the comment.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-20 12:18:26 -07:00
Ian Romanick cbb941cdec intel/blorp: Apply source offset in the TEX case
Previously the offset was only applied in the TXF case.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Ian Romanick 990f2be139 intel/blorp: Apply Gen4 coord. normalization after cubemap sizes are adjusted
Otherwise the values used for coordinate normalization use the wrong
sizes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Jason Ekstrand b2dd61196e intel/blorp: Set needs_(dst|src)_offset for Gen4 cubemaps
We call convert_to_single_slice so they may end up with a non-trivial
offset that needs to be taken into account.

v2 (idr): Also set needs_src_offset.  Suggested by Jason.

Fixes ES2-CTS.functional.texture.specification.basic_copyteximage2d.cube_rgba
and ES2-CTS.functional.texture.specification.basic_copytexsubimage2d.cube_rgba
on G45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101284
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-20 11:07:02 -07:00
Lionel Landwerlin 6d759cbd49 intel: common: add number of thread per eu
This will be used by to normalize OA counters.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Lionel Landwerlin c77d98ef32 intel: common: express timestamps units in frequency
Rather than storing the period as a double that looses some precision.

Also fixes the Gen9LP timestamp frequency which is no 19200123 but
19200000 as pointed by Ville :

https://lists.freedesktop.org/archives/intel-gfx/2017-April/125126.html

Finally add the Cannonlake timestamp frequency.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Lionel Landwerlin 5f2fe9302c intel: common: add flag to identify platforms by name
The perf infrastructure needs to identify specific platforms, not just
generations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-19 22:11:00 +01:00
Jonas Kulla a52ee32a9a anv: Fix L3 cache programming on Bay Trail
Valid values for URBAllocation start at 32, so substract that
before programming the register.

This was missed when porting from the GL driver.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-19 12:05:52 -07:00
Topi Pohjolainen 0d1af164e1 intel/isl/gen6: Allow arrayed stencil
Nothing prevents arrayed stencil surfaces even though hardware
doesn't support mipmapping.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-17 06:38:56 +03:00
Rafael Antognolli 3a767f8b06 genxml: The viewport state offset is actually an address.
This fixes code generation on gen45.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-16 15:01:16 -07:00
Rafael Antognolli ad109c16c2 genxml: Rename fields to match gen6+.
"Anti-aliasing Enable" to "Anti-Aliasing Enable".

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-16 15:01:16 -07:00
Rafael Antognolli 1b42cd52a2 genxml: Rename SF_STATE field to match gen6+.
Rename "Use Point Width State" to "Point Width Source". It accepts the same
values and has the same meaning as gen6+, so lets keep them with the same name
to simplify the code.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-16 15:01:16 -07:00
Anuj Phogat c07271fef0 intel/isl: Add the maximum surface size limit
V2: Use 2^31 bytes (2GB) surface size limit on pre-gen9 and
    2^38 bytes for gen9+.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-06-16 09:05:05 -07:00
Anuj Phogat 7022978237 intel/isl: Use uint64_t to store total surface size
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-06-16 09:05:05 -07:00
Jason Ekstrand 7175561598 intel/blorp: Work around Sandy Bridge occlusion query issue
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand 96f9d4de7d intel/isl: Properly set SeparateStencilBufferEnable on gen5-6
On gen5-6, SeparateStencilBufferEnable and HierarchicalDepthBufferEnable
come hand in hand and we have to set either both or neither.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Kenneth Graunke af373ea4a2 genxml: Fix Gen4-5 SF_STATE "Line Width" fixed point type.
It's a U3.1.  It became a U3.7 on Sandybridge.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Ben Widawsky e179a3438a i965/cnl: Add a preliminary device for Cannonlake
v2 (Anuj):
Rebased on master and updated pci ids
Remove redundant initialization of max_wm_threads to 64 * 12.
For gen9+ max_wm_threads are initialized in gen_get_device_info().

v3 (Anuj):
Move the patch to end of series.
Remove unused gt1, gt2, gt3 functions.
Remove l3_banks variable. Variable is now available on master.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-06-09 16:03:00 -07:00
Jason Ekstrand f2cbf738b4 anv: Don't advertise support on anything above gen9
This will prevent the driver from even trying to work on Cannon Lake
until we get actual support added.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-06-09 16:03:00 -07:00
Anuj Phogat 9acc93feeb i965/cnl: Enable CCS_E and RT support for few formats
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat 61f171292e i965/cnl: Reformat surface_format_info table to accomodate gen10+
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat f9e31a26d4 i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3
v1: By Ben Widawsky <benjamin.widawsky@intel.com>
v2: v1 had an assert only for VS. Add the restriction for GS, HS and
    DS as well and make sure the allocated sizes are not multiple of 3.
v3: Move the entry_size checks in to compiler code (Ken)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-09 16:02:59 -07:00
Anuj Phogat 111881abac i965/cnl: Handle gen10 in switch cases across the driver
V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec()
    gen10_init_atoms() (Jason)
    Remove Vulkan changes. Do them later in a separate patch.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat 30e749c8f1 i965/cnl: Update few assertions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:59 -07:00
Anuj Phogat 56b4d82729 i965/cnl: Add cnl bits in aubinator
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat dc83ce7a16 i965/cnl: Wire up android Mesa build files for gen10
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-06-09 16:02:58 -07:00
Anuj Phogat e01c5a6824 i965/cnl: Wire up Mesa build files for gen10
V2: Remove isl_gen10.c and isl_gen10.h

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-09 16:02:58 -07:00
Anuj Phogat 2417d5ca19 intel/genxml: Update genx_bits for gen10+
This commit adds a gen10 case to the switch statement and
drops some unneeded code for handling gen numbers which
doesn't work on gen10 and above.

V2: Drop "z = float(z)" and the "z *= 10" lines

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat 98b95a3735 i965/cnl: Add gen10 specific function declarations
These declarations will help the code start compiling
once we wire up the makefiles for gen10. Later patches
will start using these functions for gen10.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat 2704ccc646 i965/cnl: Include gen10_pack.h
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Anuj Phogat a48cb9cf7f i965/cnl: Define genX(x) and GENX(x) for gen10
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 16:02:58 -07:00
Jason Ekstrand aa416f515a i965/genxml: Add gen10.xml
V2(Anuj):
Add default value for length of 3DPRIMITIVE command
Add values for 'Attribute Active Component Format'
Rename few fields to match gen9.xml

V3 (Ander Conselvan de Oliveira)
Add gen10 alias for MOCS
Make 3DSTATE_CONSTANT_BODY on Gen10 use arrays

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-06-09 16:00:49 -07:00
Ben Widawsky d968f072bc i965: Make feature macros gen8 based
All the "features" of the hardware are similar starting with GEN8, so remove as
much of the GEN9 uniqueness as possible. This makes implementing future gen
platforms a bit easier.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-09 15:27:14 -07:00
Jason Ekstrand a59c7f834c intel/isl: Add an enum for describing auxiliary compression state
This enum describes all of the states that a auxiliary compressed
surface can have.  All of the states as well as normative language for
referring to each of the compression operations is provided in the
truly colossal comment for the new isl_aux_state enum.  There is also
a diagram showing how surfaces move between the different states.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 22:18:53 -07:00
Jason Ekstrand bacae7221b blorp: Use FullSurfaceDepthandStencilClear for blorp_hiz_op
The blorp_hiz_op entrypoint always acts on a full subresource of a HiZ
buffer so we can just set the flag unconditionally.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand 9cb6ac62fb intel/blorp: Plumb through access to the workaround BO
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101283
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Nanley Chery ed5801864e anv/blorp: Move the depth cache flush outside of BLORP
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-07 08:54:54 -07:00
Jason Ekstrand fbd8a33f61 intel/blorp: Refactor the HiZ op interface
This commit does a few things:

 1) Now that BLORP can do HiZ ops on gen8+, drop the gen6 prefix.
 2) Switch parameters to uint32_t to match the rest of blorp.
 3) Take a range of layers and loop internally.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-07 08:54:54 -07:00
Jason Ekstrand f9fd976e8a i965/miptree: Store fast clear colors in an isl_color_value
This commit, out of necessity, makes a number of changes at once:

 1) Changes intel_mipmap_tree to store the clear color for both color
    and depth as an isl_color_value.

 2) Changes the depth/stencil emit code to do the format conversion of
    the depth clear value on Haswell and earlier instead of pulling a
    uint32_t directly from the miptree.

 3) Changes ISL's depth/stencil emit code to perform the format
    conversion of the depth clear value on Haswell and earlier instead
    of assuming that the depth value in the float is pre-converted.

 4) Changes blorp to pass the depth value through as a float.

 5) Changes the Vulkan driver to pass the depth value to blorp as a
    float rather than a uint.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-06-07 08:54:54 -07:00
Anuj Phogat 8d02916e0c intel: Fix broxton 2x6 way size computation
This patch is undoing the changes to way size computation
in broxton 2x6, made by below commit:

Commit: 0d576fbfbe
Author:     Anuj Phogat <anuj.phogat@gmail.com>
i965: Simplify l3 way size computations

By making use of l3_banks field in gen_device_info struct
l3_way_size for gen7+ = 2 * l3_banks.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101306
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-06 21:30:51 -07:00
Eric Engestrom 63a8a88ac4 tree-wide: remove trailing backslash
Simple search for a backslash followed by two newlines.
If one of the newlines were to be removed, this would cause issues, so
let's just remove these trailing backslashes.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-06-07 01:18:09 +01:00
Alex Smith 922b038864 anv: Set better descriptor set limits
Based on discussions with Jason, Ivy Bridge and Bay Trail only actually
support 16 samplers, while newer hardware can support more than the
current limit of 64. Therefore set the lower limit where needed, and
bump up to 128 for everything else. There is also a limit on the total
number of other resources of around 250.

This allows Dawn of War III to render correctly on ANV.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-06 08:20:09 -07:00
Alex Smith 59c1797d56 anv: Set driver version to Mesa version
As already done by RADV.

v2: Move version calculation function to src/vulkan/util to share with
    RADV.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-06 08:20:00 -07:00
Alex Smith 621b3410f5 util/vulkan: Move Vulkan utilities to src/vulkan/util
We have Vulkan utilities in both src/util and src/vulkan/util. The
latter seems a more appropriate place for Vulkan-specific things, so
move them there.

v2: Android build system changes (from Tapani Pälli)

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-06 08:17:13 -07:00
Lionel Landwerlin 2ef73473c8 intel: gen-decoder: rework how we handle groups
The current way of handling groups doesn't seem to be able to handle
MI_LOAD_REGISTER_* with more than one register. This change reworks
the way we handle groups by building a traversal list on loading the
GENXML files.

Let's say you have

Instruction {
  Field0
  Field1
  Field2
  Group0 (count=2) {
    Field0-0
    Field0-1
  }
  Group1 (count=4) {
    Field1-0
    Field1-1
  }
}

We build of linked on load that goes :

Instruction -> Group0 -> Group1

All of those are gen_group structures, making the traversal trivial.
We just need to iterate groups for the right number of timers (count
field in genxml).

The more fancy case is when you have only a single group of unknown
size (count=0). In that case we keep on reading that group for as long
as we're within the DWordLength of that instruction.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-06 14:04:37 +01:00
Kenneth Graunke 9cd69022d5 i965: Change INTEL_DEBUG=vec4 to INTEL_SCALAR_VS for consistency.
We moved to INTEL_SCALAR_* when we added more than a single stage, but
never went back and converted the VS to work that way.  Be consistent.

Also update the documentation to actually mention these debug variables.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-06-05 23:32:40 -07:00
Anuj Phogat 0d576fbfbe i965: Simplify l3 way size computations
By making use of l3_banks field in gen_device_info struct
l3_way_size for gen7+ = 2 * l3_banks.

V2: Keep the get_l3_way_size() function.

Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-02 16:21:56 -07:00
Anuj Phogat eb23be1d97 i965: Add and initialize l3_banks field for gen7+
This new field helps simplify l3 way size computations
in next patch.

V2: Initialize the l3_banks to 0 in macros.

Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-02 16:21:56 -07:00
Jason Ekstrand 1a22c4c960 intel/blorp: Handle gen6 stencil/HiZ offsets in the back-end
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:34:01 -07:00
Jason Ekstrand d065a9540c intel/isl: Add a helper for getting the byte/tile offset of a subimage
Frequently, get_image_offset_sa is combined with get_intratile_offset_sa
so it makes sense to have a single helper to do both.  If the caller
doesn't want the intratile offsets, it can simply pass NULL and ISL will
assert that they are 0.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:58 -07:00
Jason Ekstrand b178762d05 intel/isl: Make get_intratile_offset_el take the element size in bits
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:56 -07:00
Jason Ekstrand 757f7087a5 intel/isl: Add a new layout for HiZ and stencil on Sandy Bridge
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:47 -07:00
Jason Ekstrand cb8cdab8e8 intel/isl: Generate phys_total_el from isl_calc_phys_extent
The only surface layout for which slice0 makes any sense is GEN4_2D.
Move all of the slice0 stuff into isl_calc_phys_total_extent_el_gen4_2d
and make the others trivially return the total size in surface elements.
As a side-effect, array_pitch_el_rows is now returned from these helpers
as well.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:45 -07:00
Jason Ekstrand 918f41bb29 intel/isl: Don't check array pitch for gen4 3D textures
Array pitch doesn't matter in this layout.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:43 -07:00
Jason Ekstrand 044bfb292f intel/isl: Refactor to use a phys_total_el extent.
We've already implicitly been using a physical total size in surface
elements.  This just centralizes things a bit.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:41 -07:00
Jason Ekstrand 1547d133ac intel/isl: Add an isl_assert_div helper
This is a fairly common operation and it's nice to be able to just call
the one little function.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:39 -07:00
Jason Ekstrand 58051ad220 intel/isl: Refactor isl_calc_array_pitch_el_rows
Over 90% of the function only applies to ISL_DIM_LAYOUT_GEN4_2D anyway
so we can just handle the other two as special cases at the top.  The
two "generic" cases below the switch only apply on gen9 and above and
only to 3D or CCS surfaces.  This implies that they only apply to
surfaces with ISL_DIM_LAYOUT_GEN4_2D.  Making them look generic is a
lie.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:37 -07:00
Jason Ekstrand fe13c59c1b intel/isl: Move isl_calc_array_pitch_el_rows higher up
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:34 -07:00
Jason Ekstrand c1a70165be intel/isl: Remove the device parameter from isl_tiling_get_info
We were only using it for validating that we don't use Ys/Yf on gen8 and
earlier.  Removing it from isl_tiling_get_info lets us remove it from a
bunch of other things that had no business needing a hardware
generation.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-01 15:33:31 -07:00
Kenneth Graunke fe14a9a501 i965: Drop duplicate shadow variable.
We already initialized this at the top of the function.

Trivial.
2017-06-01 14:28:12 -07:00
Kenneth Graunke fe9699dcb4 genxml: Make 3DSTATE_CONSTANT_BODY on Gen7+ use arrays.
This will let us initialize the constant buffers with loops.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:46 -07:00
Kenneth Graunke 12303bd390 genxml: Fix decoder to print the array element on field members.
Previously we'd print things like:

   0xfffbb568:  0x00010000 : Dword 1
       ReadLength: 0
       ReadLength: 1
   0xfffbb568:  0x00000001 : Dword 1
       ReadLength: 1
       ReadLength: 0

instead of the more obvious:

   0xfffbb568:  0x00010000 : Dword 1
       ReadLength[0]: 0
       ReadLength[1]: 1
   0xfffbb568:  0x00000001 : Dword 1
       ReadLength[2]: 1
       ReadLength[3]: 0

(Yes, the ralloc context here is bogus - the decoder leaks just about
everything.  We need to use proper ralloc contexts someday...)

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:46 -07:00
Kenneth Graunke 73c21e69d0 genxml: Fix decoding of array groups.
If you had a group as the first element of a struct, i.e.

  <struct name="3DSTATE_CONSTANT_BODY" length="10">
    <group count="4" start="0" size="16">
      <field name="ReadLength" start="0" end="15" type="uint"/>
    </group>
    ...
  </struct>

we would get a group_offset of 0, causing create_field() to think the
field wasn't in a group, and fail to offset forward for successive array
elements.  So we'd mark all the array elements as offset 0.

Using ctx->group->elem_size is a better check for "are we in a group?".

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:45 -07:00
Kenneth Graunke d1b949282f genxml: Fix decoder for groups with multiple fields.
If you have something like:

    <group count="0" start="96" size="32">
      <field name="Entry_0" start="0" end="15" type="GATHER_CONSTANT_ENTRY"/>
      <field name="Entry_1" start="16" end="31" type="GATHER_CONSTANT_ENTRY"/>
    </group>

We would reset ctx->group_count to 0 after processing the first field,
so the second would not have a group count.

This is largely untested, as the only groups with multiple fields are
packets we don't emit in Mesa.  Found by inspection.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:45 -07:00
Kenneth Graunke df2d55ba57 genxml: Fix parsing of address fields in groups.
For example,

    <group count="4" start="64" size="64">
      <field name="Pointer" start="5" end="63" type="address"/>
    </group>

used to generate:

   const uint64_t v2_address =
      __gen_combine_address(data, &dw[2], values->Pointer, 0);
   ...
   const uint64_t v4_address =
      __gen_combine_address(data, &dw[4], values->Pointer, 0);
   ...

but now generates code with proper subscripts:

   const uint64_t v2_address =
      __gen_combine_address(data, &dw[2], values->Pointer[0], 0);
   ...
   const uint64_t v4_address =
      __gen_combine_address(data, &dw[4], values->Pointer[1], 0);
   ...

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-06-01 11:49:45 -07:00
Kenneth Graunke 65f5f3c85c i965: Move SOL PSIZ hacks from draw time to link time.
We can just update the gl_transform_feedback_info fields at link time
to make the VUE header fields have the right location and component.
Then we don't need to handle them specially at draw time, which is
expensive.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-01 00:08:29 -07:00
Kenneth Graunke 56535959fd anv: Port over CACHE_MODE_1 optimization fix enables from brw.
Ben and I haven't observed these to help anything, but they enable
hardware optimizations for particular cases.  It's probably best to
enable them ahead of time, before we run into such a case.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-30 14:59:31 -07:00
Kenneth Graunke 53368b008e genxml: Add Gen9 CACHE_MODE_1 definitons.
These were already in gen8.xml but not gen9.xml.  There are a few new
fields and a couple that have changed.  These are all documented in the
Skylake PRM, Volume 2c Command Reference: Registers, Part 1.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-30 14:59:31 -07:00
Kenneth Graunke 9afe5846d2 genxml: Make a SCISSOR_RECT structure on Gen4-5.
Gen6+ support multiple scissor rectangles, and define a SCISSOR_RECT
structure containing their dimensions.  On Gen4-5, those same fields
exist in SF_VIEWPORT.

This patch extracts the SF_VIEWPORT fields into a SCISSOR_RECT
structure.  Although not a named concept on Gen4-5, it works just
as well, and gives us a consistent SCISSOR_RECT structure across
all generations, making it easier to reuse code.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-29 21:46:37 -07:00
Kenneth Graunke 1e3880544e i965: Ignore INTEL_SCALAR_* debug variables on Gen10+.
Scalar mode has been default since Broadwell, and vector mode is getting
increasingly unmaintained.  There are a few things that don't even fully
work in vector mode on Skylake, but we've never cared because nobody
uses it.  There's no point in porting it forward to new platforms.

So, just ignore the debug options to force it on.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-29 21:40:44 -07:00
Emil Velikov 3e8790bff0 anv: automake: list shared libraries after the static ones
The compiler can discard the shared ones from the link chain, since
there is no user (the static libraries) before it on the command line.

Cc: mesa-stable@lists.freedesktop.org
Reported-by: Laurent Carlier <lordheavym@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-05-29 16:42:41 +01:00
Jason Ekstrand 79f2a5541f i965: Use BLORP for color clears on gen4-5
We don't support replicated data clears yet.  Those take a bit more work
and enabling replicated data clears in its own commit is probably better
for bisectibility anyway.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand fa13ef285d intel/blorp: Assert that no one tries to blit combined depth stencil
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 752d7af77a i965: Add blorp support for gen4-5
Due to complications with things such as URB setup on gen4-5, it's
easier to keep gen4 support in blorp completely internal to i965.  This
makes things a bit awkward because that means there's a file in i965
that includes blorp_priv.h but it's either that or have a file in blorp
that includes brw_context.h.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 23125b7102 intel/blorp: Set additional brw_wm_prog_key fields on gen4-5
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 0ed6f196fc intel/blorp: Add support for gen4-5 SF programs
As part of enabling support for SF programs, we plumb the SF URB size
through to emit_urb_config.  For now, it's always zero but, on gen4, it
may be something larger.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 8bce7bda45 intel/blorp: Make convert_to_single_slice available outside blorp_blit
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 110061afa2 intel/blorp: Use designated initializers to set up VERTEX_ELEMENTS
We also add a slot variable and use it as an iterator.  This will make
it much easier to conditionally put something between the header and the
vertex position.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand ac79806766 intel/blorp: Rename emit_viewport_state to emit_cc_viewport
The real point of this packet is that it sets up CC_VIEWPORT so that
name is a bit better.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 1f2f90be1f intel/blorp: Make the common genX_blorp_exec code gen4-safe
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand a7f5d6df8a intel/blorp: Re-arrange blorp_genX_exec.h
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 302c0488cf intel/blorp: Don't use ffma directly
It isn't supported prior to gen6 and, on gen6+, NIR will fuse the fmul
and fadd into an ffma automatically for us anyway.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 675ec434f3 intel/blorp: Delete isl_to_gen_ds_surfype
It's no longer used.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand e80f0840bf intel/blorp: Pull the pipeline bits of blorp_exec into a helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 3d35e5a51e intel/blorp/blit: Add support for normalized coordinates
Gen5 and earlier can't do non-normalized coordinates so we need to
compensate in the shader.  Fortunately, it's pretty easy plumb through.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 18e18a1863 i965: Move clip program compilation to the compiler
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 9fb8a8775b i965: Move SF compilation to the compiler
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 21ba2b4bef intel/compiler: Make brw_disasm take const assembly
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand c336c224a6 intel/decoder: Handle the BLT ring in gen_group_get_length
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 9d1001c8e5 intel/decoder: Handle gen4 VF_STATISTICS and PIPELINE_SELECT
These need special handling because they have no "DWord Length"
parameter and they have an unusual bias of 1.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 87588e546e intel/genxml: Rename 3DSTATE_AA_LINE_PARAMS on gen5
All of the other gens use "PARAMETERS".

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 04f6d975e1 intel/genxml: Use the right subtype for VF_STATISTICS on gen4
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 1fcc5e2399 intel/genxml: Iron Lake doesn't support non-normalized sampler coordinates
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 648b618dc5 intel/genxml: Add SAMPLER_STATE to gen 4.5
Somehow this got missed.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 3f8ee8c703 intel/genxml: Rename the CC_VIEWPORT pointer on gen4-5
It isn't a pointer to "color calc state", that's the packet it's in.
It's a pointer to the CC viewport state.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 0ee1ef0cbb intel/genxml: Sampler state is a pointer on gen4-5
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 64243d3b8e intel/genxml: Suffix KSP0 fields on Iron Lake
Iron Lake introduced the multiple KSP thing and so you have KSP0-3.
However, the genxml didn't have an index on the first "Kernel Start
Pointer" or "GRF Register Count".  Add one to match gen6+.  While we're
here, we drop the brackets from the other "GRF Register Count" fields.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 7769e448aa intel/genxml: Make a bunch of things offsets on gen4-5
Most things on gen4-5 are addresses because we don't have dynamic state
base address and we don't have instruction state base on gen4.  However,
whoever converted things to addresses got a little over-excited and
converted too much.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 8257fe7b18 intel/isl: Add gen4_filter_tiling
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 332a5d7a3f intel/isl: Add support for setting component write disables
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 8958355549 intel/isl: Add support for gen4 cube maps to get_image_offset_sa
Gen4 cube maps are a 2-D surface with ISL_DIM_LAYOUT_GEN4_3D which is a
bit weird but accurate none the less.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand b9b7792d9a intel/isl: Don't request space for stencil/hiz packets unless needed
On Iron Lake, the packets exist but we never emit them so there's no
need for us to ask the driver to make batch space for them.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Jason Ekstrand 554a1731a5 intel/blorp: Move the gen7 stencil format workaround to blorp_blit
It's not needed for blorp_copy because it already overrides formats.
It's also not needed for blorp_clear because it clears stencil as
stencil.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-26 07:58:01 -07:00
Lionel Landwerlin 359fa0e9a0 aubinator: report error on unknown device id
Since we're going to stop aubinator without a valid device id, better
report an error. This also silences a Coverity warning.

CID: 1405004
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-24 10:50:18 +01:00
Lionel Landwerlin 8f1f1d294d aubinator: be consistent on exit code
We're using both exit(1) & exit(EXIT_FAILURE), settle for one, same
for success.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-24 10:50:18 +01:00
Lionel Landwerlin 6200d835a0 aubinator: fix double free
1;4601;0c
Free previously allocated filename outside the for loop.

CID: 1405014
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-24 10:50:18 +01:00
Jason Ekstrand 39adea9330 anv: Require vertex buffers to come from a 32-bit heap
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 17:37:43 -07:00
Jason Ekstrand 50d0eb5096 anv: Advertise both 32-bit and 48-bit heaps when we have enough memory
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 17:37:42 -07:00
Jason Ekstrand 34581fdd4f anv: Refactor memory type setup
This makes us walk over the heaps one at a time and add the types for
LLC and !LLC to each heap.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:42 -07:00
Jason Ekstrand b83b1af6f6 anv: Make supports_48bit_addresses a heap property
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:40 -07:00
Jason Ekstrand 00df1cd9d6 anv: Stop setting BO flags in bo_init_new
The idea behind doing this was to make it easier to set various flags.
However, we have enough custom flag settings floating around the driver
that this is more of a nuisance than a help.  This commit has the
following functional changes:

 1) The workaround_bo created in anv_CreateDevice loses both flags.
    This shouldn't matter because it's very small and entirely internal
    to the driver.

 2) The bo created in anv_CreateDmaBufImageINTEL loses the
    EXEC_OBJECT_ASYNC flag.  In retrospect, it never should have gotten
    EXEC_OBJECT_ASYNC in the first place.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:38 -07:00
Jason Ekstrand 10fad58b31 anv: Set image memory types based on the type count
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:36 -07:00
Jason Ekstrand f7736ccf53 anv: Add valid_bufer_usage to the memory type metadata
Instead of returning valid types as just a number, we now walk the list
and check the buffer's usage against the usage flags we store in the new
anv_memory_type structure.  Currently, valid_buffer_usage == ~0.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:34 -07:00
Jason Ekstrand 92325a7efc anv: Determine the type of mapping based on type metadata
Before, we were just comparing the type index to 0.  Now we actually
look the type up in the table and check its properties to determine what
kind of mapping we want to do.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:32 -07:00
Jason Ekstrand c1f4343807 anv: Set up memory types and heaps during physical device init
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:30 -07:00
Jason Ekstrand eceaf7e234 anv: Predicate 48bit support on gen >= 8
This doesn't matter right now since it only affects whether or not we
set the kernel bit but, if we ever do anything else based on it, we'll
want it to be correct per-gen.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:27 -07:00
Jason Ekstrand 4eecd534f0 anv/image: Get rid of the memset(aux, 0, sizeof(aux)) hack
Up until now, we've been memsetting the auxiliary surface to 0 at
BindImageMemory time to ensure that it is properly initialized.
However, this isn't correct because apps are allowed to freely alias
memory between different images and buffers so long as they properly
track whether or not a particular image is valid and, if it isn't,
transition from UNINITIALIZED to something else before using it.  We
now implement those transitions so we can drop the hack.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:22 -07:00
Jason Ekstrand cc45c4bb80 anv: Handle transitioning depth from UNDEFINED to other layouts
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:20 -07:00
Jason Ekstrand 75edecf502 anv: Handle color layout transitions from the UNINITIALIZED layout
This causes dEQP-VK.api.copy_and_blit.resolve_image.partial.* to start
failing due to test bugs.  See CL 1031 for a test fix.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-05-23 16:46:03 -07:00
Nanley Chery 52a6fd9871 intel/isl: Add ASTC HDR to format lists and helpers
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-05-22 11:13:53 -07:00
Tapani Pälli f0051fcf2b android: add -Wl,--build-id=sha1 to LDFLAGS for libvulkan_intel
Just like is done on desktop and what is expected by the build-id code.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-20 08:59:57 +03:00
Emil Velikov acf3d2afab configure: check once for DRI3 dependencies
Currently we are having the XCB_DRI3 dependencies duplicated,
partially.

Just do a once-off check and add all of the respective CFLAGS/LIBS
where needed.

As a nice side effect this helps us solve a couple of FIXMEs.

DRI3 is not a thing w/o X11 so disable it in such cases.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-05-19 19:44:15 +01:00
Nanley Chery 56458cb168 anv/formats: Update the three-channel BC1 mappings
The procedure for decompressing an opaque BC1 Vulkan format is dependant on the
comparison of two colors stored in the first 32 bits of the compressed block.
Here's the specified OpenGL (and Vulkan) behavior for reference:

   The RGB color for a texel at location (x,y) in the block is given by:

      RGB0,              if color0 > color1 and code(x,y) == 0
      RGB1,              if color0 > color1 and code(x,y) == 1
      (2*RGB0+RGB1)/3,   if color0 > color1 and code(x,y) == 2
      (RGB0+2*RGB1)/3,   if color0 > color1 and code(x,y) == 3

      RGB0,              if color0 <= color1 and code(x,y) == 0
      RGB1,              if color0 <= color1 and code(x,y) == 1
      (RGB0+RGB1)/2,     if color0 <= color1 and code(x,y) == 2
      BLACK,             if color0 <= color1 and code(x,y) == 3

The sampling operation performed on an opaque DXT1 Intel format essentially
hard-codes the comparison result of the two colors as color0 > color1. This
means that the behavior is incompatible with OpenGL and Vulkan. This is stated
in the SKL PRM, Vol 5: Memory Views:

   Opaque Textures (DXT1_RGB)
      Texture format DXT1_RGB is identical to DXT1, with the exception that the
      One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and
      the resulting texel color is derived strictly from the Opaque Color Encoding.
      The alpha channel defaults to 1.0.

      Programming Note
      Context: Opaque Textures (DXT1_RGB)
      The behavior of this format is not compliant with the OGL spec.

The opaque and non-opaque BC1 Vulkan formats are specified to be decoded in
exactly the same way except the BLACK value must have a transparent alpha
channel in the latter. Use the four-channel BC1 Intel formats with the alpha
set to 1 to provide the behavior required by the spec.

v2 (Kenneth Graunke):
- Provide a more detailed commit message.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-05-18 16:46:15 -07:00
Jason Ekstrand c499faebd7 anv: Add an option to abort on device loss
This is mostly for running in our CI system to prevent dEQP from
continuing on to the next test if we get a GPU hang.  As it currently
stands, dEQP uses the same VkDevice for almost all tests and if one of
the tests hangs, we set the anv_device::device_lost flag and report
VK_ERROR_DEVICE_LOST for all queue operations from that point forward
without sending anything to the GPU.  dEQP will happily continue trying
to run tests and reporting failures until it eventually gets crash that
forces the test runner to start over.  This circumvents the problem by
just aborting the process if we ever get a GPU hang.  Since this is not
the recommended behavior most of the time, we hide it behind an
environment variable.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-18 16:32:11 -07:00
Jason Ekstrand 53f997de77 anv: Wrap the device lost error in vk_error in QueueSubmit
We weren't wrapping this before because anv_cmd_buffer_execbuf may throw
a more meaningful error message.  However, we do change the error code
into VK_ERROR_DEVICE_LOST, so we should print a new message.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-18 16:32:11 -07:00
Iago Toral Quiroga 2322ddf548 anv: fix multiview for clear commands
According to the VK_KHX_multiview spec:

"Multiview causes all drawing and clear commands in the subpass to
behave as if they were broadcast to each view, where each view is
represented by one layer of the framebuffer attachments."

This adds support for multiview clears, which were missing in the
initial implementation.

v2 (Jason):
  - split multiview from regular case
  - Use for_each_bit() macro

Fixes new CTS multiview tests:
dEQP-VK.multiview.clear_attachments.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-18 11:53:25 +02:00
Samuel Iglesias Gonsálvez e69e5c7006 i965/vec4: load dvec3/4 uniforms first in the push constant buffer
Reorder the uniforms to load first the dvec4-aligned variables in the
push constant buffer and then push the vec4-aligned ones. It takes
into account that the relocated uniforms should be aligned to their
channel size.

This fixes a bug were the dvec3/4 might be loaded one part on a GRF and
the rest in next GRF, so the region parameters to read that could break
the HW rules.

v2:
- Fix broken logic.
- Add a comment to explain what should be needed to optimise the usage
  of the push constant buffer slots, as this patch does not pack the
  uniforms.

v3:
- Implemented the push constant buffer usage optimization.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez 8aa6ada838 i965/vec4: fix swizzle and writemask when loading an uniform with constant offset
It was setting XYWZ swizzle and writemask to all uniforms, no matter if they
were a vector or scalar, so this can lead to problems when loading them
to the push constant buffer.

Moreover, 'shift' calculation was designed to calculate the offset in
DWORDS, but it doesn't take into account DFs, so the calculated swizzle
for the later ones was wrong.

The indirect case is not changed because MOV INDIRECT will write
to all components. Added an assert to verify that these uniforms
are aligned.

v2:
- Fix 'shift' calculation (Curro)
- Set both swizzle and writemask.
- Add assert(shift == 0) for the indirect case.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez 354f7f2cb9 i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution
We are going to add a packing feature to reduce the usage of the push
constant buffer. One of the consequences is that 'nr_params' would be
modified by vec4_visitor's run call, so we need to restore it if one of
them failed before executing the fallback ones. Same thing happens to the
uniforms values that would be reordered afterwards.

Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when
the dvec4 alignment and packing patch is applied.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-05-18 06:49:28 +02:00
Chih-Wei Huang bfc0c23843 Android: correct libz dependency
Commit 6facb0c0 ("android: fix libz dynamic library dependencies")
unconditionally adds libz as a dependency to all shared libraries.
That is unnecessary.

Commit 85a9b1b5 introduced libz as a dependency to libmesa_util.
So only the shared libraries that use libmesa_util need libz.

Fix Android Lollipop build by adding the include path of zlib to
libmesa_util explicitly instead of getting the path implicitly
from zlib since it doesn't export the include path in Lollipop.

Fixes: 6facb0c0 "android: fix libz dynamic library dependencies"

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rob Herring <robh@kernel.org>
2017-05-17 14:04:18 +01:00
Jason Ekstrand e0d6f9afba intel/isl/gen6: Fix combined depth stencil alignment
All combined depth stencil buffers (even those with just stencil)
require a 4x4 alignment on Sandy Bridge.  The only depth/stencil buffer
type that requires 4x2 is separate stencil.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Jason Ekstrand 74d626f383 intel/isl: Refactor gen8_choose_image_alignment_el
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Jason Ekstrand 2486c7dd54 intel/isl: Refactor gen6_choose_image_alignment_el
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Jason Ekstrand 715f47cb34 intel/isl: Refactor gen7_choose_image_alignment_el
The Ivy Bridge PRM provides a nice table that handles most of the
alignment cases in one place.  For standard color buffers we have a
little freedom of choice but for most depth, stencil and compressed it's
hard-coded.  Chad's original functions split halign and valign apart and
implemented them almost entirely based on restrictions and not the
table.  This makes things way more confusing than they need to be.  This
commit gets rid of the split and makes us implement the exact table
up-front.  If our surface isn't one of the ones in the table then we
have to make real choices.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Pohjolainen, Topi 236f17a9f7 intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4
The reasoning Chad gave in the comment for choosing a valign of 4 is
entirely bunk.  The fact that you have to multiply pitch by 2 is
completely unrelated to the halign/valign parameters used for texture
layout.  (Not completely unrelated.  W-tiling is just Y-tiling with a
bit of extra swizzling which turns 8x8 W-tiled chunks into 16x4 y-tiled
chunks so it makes everything easier if miplevels are always aligned to
8x8.)  The fact that RENDER_SURFACE_STATE::SurfaceVerticalAlignmet
doesn't have a VALIGN_8 option doesn't matter since this is gen7 and you
can't do stencil texturing anyway.

v2 (Jason Ekstrand):
 - Delete most of Chad's comment and add a more descriptive commit
   message.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-16 17:04:26 -07:00
Matt Turner 169e1e26ee i965: Fix test_eu_validate.cpp
Broken by commit a7217e909c ("i965: Pass pointer and end of assembly
to brw_validate_instructions").

Reported-by: Aaron Watry <awatry@gmail.com>
2017-05-16 11:45:07 -07:00
Jason Ekstrand b5437fc05c anv: Implement VK_KHR_get_surface_capabilities2
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-16 08:38:46 -07:00
Matt Turner b1af896853 intel/aubinator_error_decode: Disassemble shader programs
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 12:04:04 -07:00
Matt Turner 23685f07d1 intel/aubinator_error_decode: Stop decoding after MI_BATCH_BUFFER_END
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 11:43:20 -07:00
Matt Turner 8e7221fa5a intel/tools: Refactor gen_disasm_disassemble() to use annotations
Which will allow us to print validation errors found in shader assembly
in GPU hang error states.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 11:43:14 -07:00
Matt Turner aaa0329b5f intel/decoder: Fix indentation
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-15 11:43:13 -07:00
Matt Turner 3443bd45a3 genxml: Remove brackets from kernel start pointer names
Newer Gens' names don't have the brackets. Having common names will make
some later patches simpler.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-15 11:43:11 -07:00
Matt Turner aae2626be8 i965: Add a weak no-op nir_print_instr() symbol
intel_asm_annotation.c is part of libintel_compiler.la, which contains
code for disassembling and validating shaders that we want to call in
aubinator_error_decode.

dump_assembly() calls nir_print_instr() to print annotations, and
although dump_assembly() is not called by aubinator_error_decode (nor is
any function in intel_asm_annotation.c) it causes undefined references
to nir_print_instr().

To work around, provide a no-op weak symbol to resolve against.
2017-05-15 11:43:01 -07:00
Matt Turner d98e82c772 i965: Allow brw_eu_validate to handle compact instructions
This will allow the validator to run on shader programs we find in the
GPU hang error state.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-05-15 11:42:56 -07:00
Matt Turner a7217e909c i965: Pass pointer and end of assembly to brw_validate_instructions
This will allow us to more easily run brw_validate_instructions() on
shader programs we find in GPU hang error states.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-15 11:42:47 -07:00
Lionel Landwerlin 55be6653e0 intel: gen-decoder: fix xml parser leak
In the unlikely case the parsing of genxml files fails, we were
leaking an xml parser object.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-05-15 14:06:11 +01:00
Rafael Antognolli d9b4a81672 genxml: Add alias for MOCS.
Use an alias for this field on 3DSTATE_INDEX_BUFFER on gen6+, so we can set
the same value as the defines.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-11 21:27:38 -07:00
Kenneth Graunke f790d6e0b4 i965: Port Gen4-5 VS_STATE to genxml.
It's actually not that much code.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-11 16:52:59 -07:00
Kenneth Graunke d65e19f5c6 genxml: Fix KSPs on Ironlake to be offsets, not pointers.
We use Instruction State Base Address on Ironlake, so we want KSP to be
an offset not an actual pointer.  Gen4/G45 use pointers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-11 16:33:48 -07:00
Emil Velikov 4c22b99953 anv: document that anv_gem_mmap returns MAP_FAILED on error
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-11 13:58:20 +01:00
Kenneth Graunke 620f12a53f i965: Drop INTEL_DEBUG=stats.
For whatever reason, we had an INTEL_DEBUG=stats option that enabled
various statistics counters on Gen4-5 systems.  It's been around
forever, though I can't think of a single time that it's been useful.

On Gen6+, we enable statistics all the time because they're necessary
to support various query object targets.  Turning them off would break
those queries.

Gen4-5 don't support those queries, so the statistics counters generally
aren't useful; we disabled them by default.  This patch disables them
altogether.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-10 11:37:19 -07:00
Grazvydas Ignotas 0ef302638f anv: don't leak DRM devices
After successful drmGetDevices2() call, drmFreeDevices() needs to be
called.

Fixes: b1fb6e8d "anv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # radv version
2017-05-10 01:13:44 +03:00
Grazvydas Ignotas e0aee8b667 anv: fix possible stack corruption
drmGetDevices2 takes count and not size. Probably hasn't caused problems
yet in practice and was missed as setups with more than 8 DRM devices
are not very common.

Fixes: b1fb6e8d "anv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-05-10 01:13:44 +03:00
Jason Ekstrand 037ce253b1 i965/vec4: Delete the system value infastructure
The only thing still using it is INVOCATION_ID for geometry shaders.
That's easily enough inlined into the nir_intrinsic_load_invocation_id
handling code.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:07 -07:00
Jason Ekstrand 2e9916ea04 i965/vec4: Use NIR to do GS input remapping
We're already doing this in the FS back-end.  This just does the same
thing in the vec4 back-end.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:07 -07:00
Jason Ekstrand e31042ab40 i965/fs: Move remapping of gl_PointSize to the NIR level
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:06 -07:00
Jason Ekstrand 5b00c3cc05 i965/nir: Inline remap_inputs_with_vue_map
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:06 -07:00
Jason Ekstrand 0d5f89cdc3 i965/vec4: Use NIR remapping for VS attributes
The NIR pass already handles remapping system values to attributes for
us so we delete the system value code as part of the conversion.

We also change nir_lower_vs_inputs to take an explicit inputs_read
bitmask and pass in the inputs_read from prog_data instead from pulling
it out of NIR.  This is because the version in prog_data may get
EDGEFLAG added to it on some old platforms.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:06 -07:00
Jason Ekstrand 80aa6e9d32 intel/compiler/vs: Move inputs_read handling to generic code
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:03 -07:00
Jason Ekstrand d2fe804d18 i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE map
We also add a nice little comment to make it more clear exactly what
happens with the edge flag copy.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand ca4d192802 i965/fs: Lower gl_VertexID and friends to inputs at the NIR level
NIR calls these system values but they come in from the VF unit as
vertex data.  It's terribly convenient to just be able to treat them as
such in the back-end.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand 24e6fba500 i965/vs: Set uses_vertexid and friends from brw_compile_vs
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand 5e832302dc i965: Move multiply by 4 for VS ATTR setup into the scalar backend.
The vec4 backend will want to count in units of vec4s, not scalar
components.  The simplest solution is to move the multiplication by 4
into the scalar backend.  This also improves consistency with how we
count varyings.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand 36764b6923 i965/nir: Inline remap_vs_attrs
Now that we have nice block iterators, there's no good reason for this
to be off on it's own.  While we're here, we convert to using the NIR
const index getters/setters instead of whacking const_index values
directly.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand b86dba8a0e nir: Embed the shader_info in the nir_shader again
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer.  The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct.  This, however, has
caused a few problems:

 1) There are many things which generate NIR without GLSL.  This means
    we have to support both NIR shaders which come from GLSL and ones
    that don't and need to have an info elsewhere.

 2) The solution to (1) raises all sorts of ownership issues which have
    to be resolved with ralloc_parent checks.

 3) Ever since 00620782c9, we've been
    using nir_gather_info to fill out the final shader_info.  Thanks to
    cloning and the above ownership issues, the nir_shader::info may not
    point back to the gl_shader anymore and so we have to do a copy of
    the shader_info from NIR back to GLSL anyway.

All of these issues go away if we just embed the shader_info in the
nir_shader.  There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Lionel Landwerlin 32f14332f5 intel: compiler: prevent integer overflow
CID: 1399477, 1399478 (Integer handling issues)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-05-09 13:56:17 +01:00
Lionel Landwerlin 85182e490c intel: compiler: remove duplicated code
CID: 1399470: (Control flow issues)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-05-09 13:56:17 +01:00
Lionel Landwerlin 4201b7d1bf intel: gen decoder: don't check for size_t negative values
We should get either 0 or 1 here.

CID: 1373562 (Control flow issues)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2017-05-09 13:54:08 +01:00
Lionel Landwerlin e3a5ab2d66 anv: check return value of anv_execbuf_add_bo
CID: 1405919 (Error handling issues)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-08 14:38:27 +01:00
Lionel Landwerlin 6247b8b413 anv: avoid null pointer dereference
The application might not give an output structure.

CID: 1405765 (Null pointer dereferences)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-08 14:38:27 +01:00
Jason Ekstrand e05e3e07ab anv/allocator: Only write to _vg_ptr if we have valgrind
This fixes the build when not building against valgrind headers.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100945
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-05 12:49:51 -07:00
Iago Toral Quiroga 7761cf6d01 anv/query: handle more cases of 'out of host memory'
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-05 08:53:33 +02:00
Jason Ekstrand 98cd512089 anv/allocator: Improve block pool growing asserts
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 24827fdf50 anv: Drop the instruction pool block size
Now that we can allocate states larger than the block size, we no longer
need a block size of 1MB which can be rather wasteful.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 955127db93 anv/allocator: Add support for large stream allocations
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand f82d3d38b6 anv/allocator: Allow state pools to allocate large states
Previously, the maximum size of a state that could be allocated from a
state pool was a block.  However, this has caused us various issues
particularly with shaders which are potentially very large.  We've also
hit issues with render passes with a large number of attachments when we
go to allocate the block of surface state.  This effectively removes the
restriction on the maximum size of a single state.  (There's still a
limit of 1MB imposed by a fixed-length bucket array.)

For states larger than the block size, we just grab a large block off of
the block pool rather than sub-allocating.  When we go to allocate some
chunk of state and the current bucket does not have state, we try to
pull a chunk from some larger bucket and split it up.  This should
improve memory usage if a client occasionally allocates a large block of
state.

This commit is inspired by some similar work done by Juan A. Suarez
Romero <jasuarez@igalia.com>.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 8c079b566e anv/allocator: Support pushing multiple blocks onto a free list at once
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 8769fb48fb anv/allocator: Add helpers for dealing with bucket sizes
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 12043ca696 anv/allocator: Add the capability to allocate blocks of different sizes
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 01170df262 anv/allocator: Rework a comment
This commit just fixes up the English a bit and re-flows the comment.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand bcc5d0defb anv/allocator: Tweak the block pool growing algorithm
The old algorithm worked fine assuming a constant block size.  We're
about to break that assumption so we need an algorithm that's a bit more
robust against suddenly growing by a huge amount compared to the
currently allocated quantity of memory.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand d3ed72e2c2 anv/allocator: Embed the block_pool in the state_pool
Now that the state stream is allocating off of the state pool, there's
no reason why we need the block pool to be separate.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand bb2a3f0df8 anv/allocator: Get rid of the ability to free blocks
Now that everything is going through the state pools, the block pool no
longer needs to be able to handle re-use.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 08413a81b9 anv: Allocate binding table blocks through the state pool
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 55f49e6b7e anv/allocator: Add support for "back" allocations to state_pool
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 49ecaf88d1 anv/allocator: Drop the block_size field from block_pool
Since the state_stream is now pulling from a state_pool, the only thing
pulling directly off the block pool is the state pool so we can just
move the block_size there.  The one exception is when we allocate
binding tables but we can just reference the state pool there as well.

The only functional change here is that we no longer grow the block pool
immediately upon creation so no BO gets allocated until our first state
allocation.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 30d63ffe26 anv/allocator: Pull the userptr part of block_pool_grow into a helper
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand c73ce41a48 anv/allocator: Roll fixed_size_state_pool into state_pool
The helper functions aren't really gaining us as much as they claim and
are actually about to be in the way.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 6d02ef011e anv/allocator: Remove the state_size field from fixed_size_state_pool
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 367031a5c8 anv: Get rid of a bunch of uses of size_t
We should only use size_t when referring to sizes of bits of CPU memory.
Anything on the GPU or just a regular array length should be a type that
has the same size on both 32 and 64-bit architectures.  For state
objects, we use a uint32_t because we'll never allocate a piece of
driver-internal GPU state larger than 2GB (more like 16KB).

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand e86aeecb6a anv/allocator: Convert the state stream to pull from a state pool
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand e049dea5b2 anv/allocator: Return a null state for zero-size allocations
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Jason Ekstrand 45e1829274 anv/allocator: Add no-valgrind versions of state_pool_alloc/free
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-05-04 19:07:54 -07:00
Kenneth Graunke 9377801fbd anv: Simplify Cherryview line handling.
We can just use the new CHVLineWidth field rather than an entirely
different generation's packing function.

v2: Inline the function (requested by Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-04 16:17:34 -07:00
Kenneth Graunke 31f094e691 i965: Fix line width on Cherryview.
We just add another field to gen8.xml for the Cherryview line width,
rather than trying to replicate the gymnastics done in the Vulkan
driver to use gen9 SF pack functions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-05-04 16:17:34 -07:00
Emil Velikov 9d2aa6e506 anv: fix anv_gem_mmap comment to not mention NULL
The function cannot return NULL, update the comment accordingly.

Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-05-04 18:06:18 +01:00
Samuel Iglesias Gonsálvez 939b015736 anv: vkBindImageMemory() should return VK_ERROR_OUT_OF_{HOST,DEVICE}_MEMORY on failure
According to the spec we get VK_ERROR_OUT_OF_HOST_MEMORY or
VK_ERROR_OUT_OF_DEVICE_MEMORY on vkBindImageMemory failure.

Fixes returned value changed by b546c9d.

Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error")
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-05-04 15:13:08 +02:00
Samuel Iglesias Gonsálvez b546c9d318 anv: anv_gem_mmap() returns MAP_FAILED as mapping error
Take it into account when checking if the mapping failed.

v2:
- Remove map == NULL and its related comment (Emil)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

Fixes: 6f3e3c715a ("vk/allocator: Add a BO pool")
Fixes: 9919a2d34d ("anv/image: Memset hiz surfaces to 0 when binding memory")
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
2017-05-04 08:56:36 +02:00
Rafael Antognolli f321f695d3 genxml: Fix 3DSTATE_DEPTH_BUFFER length on gen5.
The hardware docs are wrong, but the length used in the xml is also
wrong.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 18:57:51 -07:00
Rafael Antognolli 2e5d65ccb6 anv: Use BRW_BARYCENTRIC_NONPERSPECTIVE_BITS from common header.
In a previous patch some enums were split out from brw_eu_defines.h, so
they could be used by genxml based code. anv can also benefit from this.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:58:55 -07:00
Rafael Antognolli 8fa8abef4b i965: Move enums to brw_compiler.h.
These enums live inside struct brw_wm_prog_data, so it makes sense to
keep them in the same header. It also allows to use them without
including brw_eu_defines.h.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:55:58 -07:00
Rafael Antognolli a66743ce8d genxml: Update 3DSTATE_LINE_STIPPLE xml on gen6.
From the PRM, Line Stipple Inverse Repeat Count is on dw2, bits 31:16,
format U1.13.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:14 -07:00
Rafael Antognolli 7d5cc5b954 genxml: Normalize xml for 3DSTATE_CC_STATE_POINTERS.
- "COLOR_CALC_STATE Change" -> "Color Calc State Pointer Valid"
   - "Pointer to COLOR_CALC_STATE" -> "Color Calc State Pointer"
   - "BackFace" -> "Backface"

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Rafael Antognolli b89805a7bc genxml: Normalize xml for 3DSTATE_MULTISAMPLE.
Name the options to "Pixel Location":
   - PIXLOC_CENTER -> CENTER
   - PIXLOC_UL_CORNER -> UL_CORNER

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Rafael Antognolli c032cae9ff genxml: Rename "Function Enable" to "Enable".
Rename that field name on genxml for:
   - 3DSTATE_GS - gen6+
   - 3DSTATE_DS - gen7+
   - 3DSTATE_HS - gen7+

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Rafael Antognolli 5b4223dc8e genxml: Clip guardbands are float, not int.
This makes genxml create the right struct types, and generate the right
batch commands.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Rafael Antognolli 4266c372d9 genxml: 3DSTATE_VS rename Function Enable to Enable.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:07 -07:00
Kenneth Graunke da299b7df3 genxml: Make "Reorder Mode" fields consistent.
Both GS and SOL have these fields.  Some were ReorderEnable = true,
some were ReorderMode = REORDER_TRAILING, and some were just TRAILING.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-05-03 16:41:07 -07:00
Rafael Antognolli 872ffb2221 genxml: Add alias for MOCS.
Use an alias, so we can set the same value as the #define's.

v3:
   - Call it "SO Buffer MOCS" to follow the most common naming scheme.
   - Add alias for gen7 and gen75 too (Ken).

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:02 -07:00
Rafael Antognolli b5e652fc83 genxml: Add missing field values to 3DSTATE_SBE.
Fill out "Attribute Active Component Format" possible values.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:02 -07:00
Rafael Antognolli 273a10b3f1 genxml: Update xml for 3DSTATE_SF.
- Normalize "Anti-Aliasing Enable"
 - Add "Multisample Rasterization Mode" constants
 - Rename "Use Point Width on Vertex" to "Vertex"
 - Rename "Use Point Width from State" to "State"

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:02 -07:00
Rafael Antognolli 3f155ab290 genxml: Rename clip enable property.
There are two variants:
   - Clip Enable
   - CLIP Enable (on gen6)

Rename everything to Clip Enable.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:41:02 -07:00
Louis-Francis Ratté-Boulianne e0aa2bd9cb genxml: Fill out Gen4, Gen45 and Gen5 XML
Add some more details to Gen4 and Gen45 and add what is needed
in Gen5 XML. This commit overwrite the previous work done on Gen4
and Gen45 as it contains more instructions and fixes some mistakes.
However, comments (dword boundaries) are lost in the process.

v3:
   - Set the type of some fields, instead of prefix. Also fix the
     SAMPLER_BORDER_COLOR_STATE fields of gen5.xml.

Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-03 16:40:52 -07:00
Jason Ekstrand 4201cc2dd3 anv: Implement VK_KHX_external_semaphore_fd
This implementation allocates a 4k BO for each semaphore that can be
exported using OPAQUE_FD and uses the kernel's already-existing
synchronization mechanism on BOs.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand ef2e427d78 anv: Pull the guts of cmd_buffer_execbuf into a helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand 975c0f339f anv: Implement VK_KHX_external_semaphore
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand 298e054d0c anv: Implement VK_KHX_external_semaphore_capabilities
This just stubs things out.  Real external semaphore support will come
with VK_KHX_external_semaphore_fd.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand 65aa89e75f anv: Add a real semaphore struct
It's just a dummy for now, but we'll flesh it out as needed for external
semaphores.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-05-03 15:09:46 -07:00
Jason Ekstrand f8d7c23e1f anv: Trivially implement multiDrawIndirect
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand 272b7e7d25 anv: Enable VK_KHX_multiview and SPV_KHR_multiview
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand 3dbd7737d4 anv/cmd_buffer: Emit instanced draws for multiple views
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand 32abb0e13c anv/cmd_buffer: Pull indirect draw parameter loading into a helper
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand 0db7070330 anv/pipeline: Add shader lowering for multiview
v2 (Jason Ekstrand):
 - Take a view_mask rather than a whole subpass
 - Build the view mask into the VS shader key

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand ca5bdfdfc6 anv/pipeline: Add a subpass field to anv_pipeline
This simplifies the code a variety of places.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand c4549e05aa anv/pipeline: Call nir_gather_info later
We want to insert more lowering code that may insert system values and
we need to gather info after that lowering.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand dcb6a68bb4 anv: Move shader hashing to anv_pipeline
Shader hashing is very closely related to shader compilation.  Putting
them right next to each other in anv_pipeline makes it easier to verify
that we're actually hashing everything we need to be hashing.  The only
real change (other than the order of hashing) is that we now hash in the
shader stage.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand d6b8106eea anv/pass: Store the per-subpass view mask
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand e997f548de anv: Add the KHX_multiview boilerplate
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand 0bed97006f anv/nir: Delete the apply_dynamic_offsets prototype
That pass hasn't existed since dd4db84640
but the prototype stuck around for no reason.

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Samuel Iglesias Gonsálvez f57e234fdd i965/vec4: don't modify regioning parameters to the sources of DF align1 instructions
The regioning parameters are now properly set by convert_to_hw_regs()
and we don't need to fix them in the generator. That latter fix
previously done in the generator was strictly speaking wrong for any
non-identity regions.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez aaeb1c99be i965/vec4: fix register width for DF VGRF and UNIFORM
On gen7, the swizzles used in DF align16 instructions works for element
size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that
in the rest of the code and prepare the instructions for this (scalarize_df()),
we need to set it to two again.

However, for DF align1 instructions, a width of 2 is wrong as we are not
reading the data we want. For example, an uniform would have a region of
<0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access
to the first 4.

This patch sets the default one to 4 and then modifies the width of
align16 instruction's DF sources when we translate the logical swizzle
to the physical one.

v2:
- Remove conditional (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez 7f728bce81 i965/vec4: fix vertical stride to avoid breaking region parameter rule
From IVB PRM, vol4, part3, "General Restrictions on Regioning
Parameters":

  "If ExecSize = Width and HorzStride ≠ 0, VertStride must
   be set to Width * HorzStride."

In next patch, we are going to modify the region parameter for
uniforms and vgrf. For uniforms that are the source of
DF align1 instructions, they will have <0, 4, 1> regioning and
the execsize for those instructions will be 4, so they will break
the regioning rule. This will be the same for VGRF sources where
we use the vstride == 0 exploit.

As we know we are not going to cross the GRF boundary with that
execsize and parameters (not even with the exploit), we just fix
the vstride here.

v2:
- Move is_align1_df() (Curro)
- Refactor exec_size == width calculation (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-03 15:32:39 +02:00
Jason Ekstrand 6ef1bd4fa5 anv/tests: Create a dummy instance as well as device
This fixes crashes caused by 35e626bd0e
which made us start referencing the instance in the allocators.  With
this commit, the tests now happily pass again.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100877
Tested-by: Vinson Lee <vlee@freedesktop.org>
2017-05-01 17:06:40 -07:00
Chad Versace 85ca563b58 anv: Drop 'x11' prefix from non-X11 WSI funcs
Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The
functions are used by Wayland WSI too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2017-04-28 08:54:45 -07:00
Jason Ekstrand ebd1bd6998 anv: Alphabetize KHR extensions
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-04-28 07:41:03 -07:00
Jason Ekstrand 032861693e anv: Move queues, events, and semaphores to their own file
Things are about to get more complicated, especially as far as
semaphores are concerned.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand 9bd1f03487 anv: Implement VK_KHX_external_memory_fd
This commit just exposes the memory handle type.  There's interesting we
need to do here for images.  So long as the user doesn't set any crazy
environment variables such as INTEL_DEBUG=nohiz, all of the compression
formats etc. should "just work" at least for opaque handle types.

v2 (chadv):
  - Rebase.
  - Fix vkGetPhysicalDeviceImageFormatProperties2KHR when
    handleType == 0.
  - Move handleType-independency comments out of handleType-switch, in
    vkGetPhysicalDeviceExternalBufferPropertiesKHX.  Reduces diff in
    future dma_buf patches.

Co-authored-with: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand 818b857914 anv: Use the BO cache for DeviceMemory allocations
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand 494d6f65a7 anv/allocator: Add a BO cache
This cache allows us to easily ensure that we have a unique anv_bo for
each gem handle.  We'll need this in order to support multiple-import of
memory objects and semaphores.

v2 (Jason Ekstrand):
 - Reject BO imports if the size doesn't match the prime fd size as
   reported by lseek().

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand 5d25ac6a4b anv: Implement VK_KHX_external_memory
This is the trivial implementation that just exposes the extension
string but exposes zero external handle types.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Chad Versace 354ca7a1d4 anv: Implement VK_KHX_external_memory_capabilities
This is a complete but trivial implementation. It's trivial becasue We
support no external memory capabilities yet.  Most of the real work in
this commit is in reworking the UUIDs advertised by the driver.

v2 (chadv):
  - Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR.
    Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of
    input structs, not the chain of output structs.
  - In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the
    input chain and the output chain separately. Reduces diff in future
    dma_buf patches.

Co-authored-with: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-27 20:08:46 -07:00
Jason Ekstrand d4d9258b61 anv/physical_device: Rename uuid to pipeline_cache_uuid
We're about to have more UUIDs for different things so this one really
needs to be properly labeled.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand 02767cb4ff anv: Refactor device_get_cache_uuid into physical_device_init_uuids
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand 35e626bd0e anv: Set EXEC_OBJECT_ASYNC when available
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand bd3a9813b9 anv/cmd_buffer: Use the device allocator for QueueSubmit
The command is really operating on a Queue not a command buffer and the
nearest object to that with an allocator is VkDevice.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand c43b4bc85e anv: Don't place scratch buffers above the 32-bit boundary
This fixes rendering corruptions in DOOM.  Hopefully, it will also make
Jenkins a bit more stable as we've been seeing some random failures and
GPU hangs ever since turning on 48bit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620
Fixes: 651ec926fc "anv: Add support for 48-bit addresses"
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-04-27 02:04:57 -07:00
Rafael Antognolli 6a40ccec4b genxml: Fix gen_pack_header.py crash when field type is invalid.
Just return earlier in that case. Also set prefix to an empty string, so
we don't get to use it undefined.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:12 -07:00
Rafael Antognolli 9670124e31 genxml: Make BLEND_STATE command support variable length array.
We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.

For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.

With this change, we also change the code for blorp and anv to emit only
the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.

v2:
   - Use designated initializers on blorp and remove 0 from
   initialization (Jason)
   - Default entries to disabled on Vulkan (Jason)
   - Rebase code.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:10 -07:00
Rafael Antognolli 4ace73b1f6 genxml: Fix python crash when no dwords are found.
If the 'dwords' dict is empty, max(dwords.keys()) throws an exception.
This case could happen when we have an instruction that is only an array
of other structs, with variable length.

v2:
   - Add another clause for empty dwords and make it work with python 3
   (Dylan)
   - Set the length to 0 if dwords is empty, and do not declare dw

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:08 -07:00
Rafael Antognolli 19720405d5 genxml: Remove unused parameter.
'start' parameter from Group.emit_pack_function() is useless.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:05 -07:00
Rafael Antognolli 1ea41163eb intel/aubinator: Correctly read variable length structs.
Before this commit, when a group with count="0" is found, only one field
is added to the struct representing the instruction. This causes only
one entry to be printed by aubinator, for variable length groups.

With this commit we "detect" that there's a variable length group
(count="0") and store the offset of the last entry added to the struct
when reading the xml. When finally reading the aubdump file, we check
the size of the group and whether we have variable number of elements,
and in that case, reuse the last field to add the remaining elements.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:13:51 -07:00
Nanley Chery 50134cede1 isl/format: Update the R16G16B16X16_FLOAT entry
The section of the PRM mentioned in the code comment above this table
says that this format supports the render target write message. Internal
documentation says that this format also supports alpha blending. As a
side effect, this allows CCS_D buffers to be created for images with
this format.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-24 13:30:50 -07:00
Nanley Chery b1066f7365 anv/pass: Delete anv_pass::subpass_attachments
This field has no users.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-24 13:30:50 -07:00
Francisco Jerez 58324389be intel/fs: Take into account amount of data read in spilling cost heuristic.
Until now the spilling cost calculation was neglecting the amount of
data read from the register during the spilling cost calculation.
This caused it to make suboptimal decisions in some cases leading to
higher memory bandwidth usage than necessary.

Improves Unigine Heaven performance by ~4% on BDW, reversing an
unintended FPS regression from my previous commit
147e71242c with n=12 and statistical
significance 5%.  In addition SynMark2 OglCSDof performance is
improved by an additional ~5% on SKL, and a Kerbal Space Program
apitrace around the Moho planet I can provide on request improves by
~20%.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-24 11:01:40 -07:00
Francisco Jerez ecc19e12dc intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.
This is what we use later on to compute the number of registers that
will actually get spilled to memory, so it's more likely to match
reality than the current open-coded approximation.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-24 10:59:56 -07:00
Kenneth Graunke 6b10c37b9c i965/vec4: Use reads_accumulator_implicitly(), not MACH checks.
Curro pointed out that I should not just check for MACH, but use
the reads_accumulator_implicitly() helper, which would also prevent
the same bug with MAC and SADA2 (if we ever decide to use them).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-24 10:53:49 -07:00
Timothy Arceri 7a7ee40c2d nir/i965: add before ffma algebraic opts
This shuffles constants down in the reverse of what the previous
patch does and applies some simpilifications that may be made
possible from doing so.

Shader-db results BDW:

total instructions in shared programs: 12980814 -> 12977822 (-0.02%)
instructions in affected programs: 281889 -> 278897 (-1.06%)
helped: 1231
HURT: 128

total cycles in shared programs: 246562852 -> 246567288 (0.00%)
cycles in affected programs: 11271524 -> 11275960 (0.04%)
helped: 1630
HURT: 1378

V2: mark float opts as inexact

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Kenneth Graunke 2faf227ec2 i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().
opt_register_coalesce() was optimizing sequences such as:

   mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
   mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D
   mov(8) m4.zw:F, vgrf5.xxxy:F

into:

   mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
   mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D

This doesn't work - if we're going to reswizzle MACH, we'd need to
reswizzle the MUL as well.  Here, the MUL fills the accumulator's .zw
components with attr18.yy * attr19.yy.  But the MACH instruction expects
.z to contain attr18.x * attr19.x.  Bogus results ensue.

No change in shader-db on Haswell.  Prevents regressions in Timothy's
patches to use enhanced layouts for varying packing (which rearrange
code just enough to trigger this pre-existing bug, but were fine
themselves).

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-22 00:01:16 -07:00
Jason Ekstrand 1e21d4227e anv/query: Use genxml for MI_MATH
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
2017-04-20 15:24:06 -07:00
Jason Ekstrand e23129ac0c genxml: Add better support for MI_MATH
This breaks the guts of MI_MATH (the instruction part) out into its own
structure with proper named values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
2017-04-20 15:24:06 -07:00
Jason Ekstrand b7a2af8e38 genxml/pack: Allow hex values in the XML
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-04-20 15:24:06 -07:00
Nanley Chery d9d793696b anv/cmd_buffer: Disable CCS on BDW input attachments
The description under RENDER_SURFACE_STATE::RedClearColor says,

   For Sampling Engine Multisampled Surfaces and Render Targets:
    Specifies the clear value for the red channel.
   For Other Surfaces:
    This field is ignored.

This means that the sampler on BDW doesn't support CCS.

Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-17 16:47:38 -07:00
Lionel Landwerlin d71efbe5f2 anv: blorp: flush memory after copy
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-17 14:45:57 -07:00
Kenneth Graunke 9b71709cb8 intel/decoder: Fix is_header_field starting condition.
Starting positions >= 32 are not part of the header, rather than >.

Caught by Coverity, which found that "bits <<= field->start" may shift
by 32, which has undefined behavior.

CID: 1404968

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-16 22:58:23 -07:00
Jason Ekstrand d2d6cf6c83 anv: Add the pci_id into the shader cache UUID
This prevents a user from using a cache created on one hardware
generation on a different one.  Of course, with Intel hardware, this
requires moving their drive from one machine to another but it's still
possible and we should prevent it.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2017-04-14 17:41:07 -07:00
Matt Turner 2eeb1b0ad9 i965: Use correct VertStride on align16 instructions.
In commit c35fa7a, we changed the "width" of DF source registers to 2,
which is conceptually fine. Unfortunately a VertStride of 2 is not
allowed by align16 instructions on IVB/BYT, and the regular VertStride
of 4 works fine in any case.

See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test
for example:

cmp.ge.f0(8)    g18<1>DF        g1<0>.xyxyDF    -g8<2>DF        { align16 1Q };
        ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
cmp.ge.f0(8)    g19<1>DF        g1<0>.xyxyDF    -g9<2>DF        { align16 2N };
        ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed

v2:
- Add spec quote (Curro).
- Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez d8441e2276 i965/vec4/dce: improve track of partial flag register writes
This is required for correctness in presence of multiple 4-wide flag
writes (e.g. 4-wide instructions with a conditional mod set) which
update a different portion of the same 8-bit flag subregister.

Right now we keep track of flag dataflow with 8-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 8 channels wide,
which can cause live flag register updates to be dead
code-eliminated incorrectly.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez c1fc8fad47 i965/vec4: don't do horizontal stride on some register file types
horiz_offset() shouldn't be doing anything for scalar registers,
because all channels of any SIMD instructions will end up reading or
writing the same component of the register, so shifting the register
offset would be wrong.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Re-implement in terms of is_uniform() for
  simplicity.  Pass argument by const reference.  Clarify commit
  message. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Matt Turner 21e8e3a848 i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT.
Otherwise for a pack_double_2x32_split opcode, we emit:

   vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
mov(8)          g5<1>UD         g5<4>.xUD                       { align16 1Q compacted };
mov(8)          g7<2>UD         g5<4,4,1>UD                     { align1 1Q };
        ERROR: When the destination spans two registers, the source must span two registers
               (exceptions for scalar source and packed-word to packed-dword expansion)
mov(8)          g8<2>UD         g5.4<4,4,1>UD                   { align1 2N };
        ERROR: The offset from the two source registers must be the same
mov(8)          g5<1>UD         g6<4>.xUD                       { align16 1Q compacted };
mov(8)          g7.1<2>UD       g5<4,4,1>UD                     { align1 1Q };
        ERROR: When the destination spans two registers, the source must span two registers
               (exceptions for scalar source and packed-word to packed-dword expansion)
mov(8)          g8.1<2>UD       g5.4<4,4,1>UD                   { align1 2N };
        ERROR: The offset from the two source registers must be the same

The intention was to emit mov(4)s for the instructions that have ERROR
annotations.

See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
for example.

v2 (Samuel):
- Instead of setting the exec size to a fixed value, don't double it
(Curro).
- Add PICK_{HIGH,LOW}_32BIT to the condition.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Trivial rebase changes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez f030aaf2fb i965/vec4: use vec4_builder to emit instructions in setup_imm_df()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies.  Demote to
  static stand-alone function.  Don't write unused components in the
  result.  Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Juan A. Suarez Romero a907c91e93 i965/vec4: consider subregister offset in live variables
Take into account offset values less than a full register (32 bytes)
when getting the var from register.

This is required when dealing with an operation that writes half of the
register (like one d2x in IVB/BYT, which uses exec_size == 4).

v2:
- Take in account this offset < 32 in liveness analysis too (Curro)

v3:
- Change formula in var_from_reg() (Curro)
- Remove useless changes (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Francisco Jerez 92649a3e67 i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Simplify assert.  Except for the 'inst->group % 4
  == 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez 6e3265eae5 i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
  the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez 50a5217637 i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.

Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero cfaf14a126 i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.

v2:
- Use stride and don't need to fix dst (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero be445d3ea3 i965/vec4: keep original type when dealing with null registers
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.

This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.

v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez a21dc2b500 i965/vec4: split DF instructions and later double its execsize in IVB/BYT
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).

v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).

v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)

v4:
- Add get_exec_type() function and use it to calculate the execution
  size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Take
  destination type as execution type where there is no valid source.
  Assert-fail if the deduced execution type is byte.  Clarify comment
  in get_lowered_simd_width().  Move SIMD width workaround outside of
  'if (...inst->size_written > REG_SIZE)' conditional block, since the
  problem should be independent of whether the amount of data written
  by the instruction is greater or lower than a GRF.  Drop redundant
  is_ivb_df definition.  Drop bogus inst->exec_size < 8 check.
  Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez a5399e8b1c i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Francisco Jerez ebfb703d44 i965/fs: Get 64-bit indirect moves working on IVB.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:08 -07:00
Matt Turner 630b84cdc8 i965: Use source region <1,2,0> when converting to DF.
Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero 3198ce3f96 i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT
According to the IVB and HSW PRMs:

"2.When the destination requires two registers and the sources are
 indirect, the sources must use 1x1 regioning mode."

So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.

This patch limits the SIMD width to 4 in this case.

v2:
- Fix typo (Matt).
- Fix condition (Curro)

v3:
- Add spec quote (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero 571cbd05eb i965/fs: fix dst stride in IVB/BYT type conversions
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.

But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.

v2:
- Fix typo (Matt)

v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)

v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez af6fc3a8ea i965/fs: rename lower_d2x to lower_conversions
v2:
- Change the name to lower_conversions.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez dee31311eb Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."
This reverts commit 7dccd38b40.

d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.

Then, 7dccd38b40 is not needed anymore.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez aeecc82d05 i965/fs: generalize the legalization d2x pass
Generalize it to lower any unsupported narrower conversion.

v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.

v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
  when fixing the conversion.
- Add get_exec_type() function.

v4 (Curro):
- Use get_exec_type() function to get sources' type.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Matt Turner 94ffeb7fa2 i965: Use <0,2,1> region for scalar DF sources on IVB/BYT.
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.

v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).

v3:
- Added comment explaining the reason (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez 82d17615f4 i965/fs: clamp exec_size when an instruction has a scalar DF source
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero 0f1316d4db i965/fs: double regioning parameters and execsize for DF in IVB/BYT
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.

So when we have something like:

mov(8) g2<1>DF g3<4,4,1>DF

We are not actually moving 8 doubles (our intention), but 4 doubles.

We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.

v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)

v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).

v4:
- Fix comment (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero 79af256388 i965/fs: add helper to retrieve instruction execution type
The execution data size is the biggest type size of any instruction
operand.

We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.

v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).

v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
  calculate its size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Fix deduced
  execution type for integer vector types.  Take destination type as
  execution type where there is no valid source.  Assert-fail if the
  deduced execution type is byte.  Move into brw_ir_fs.h header for
  consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Matt Turner fd349d29e4 i965: Handle IVB DF differences in the validator.
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.

v2 (Sam):
- Add comments.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:07 -07:00
Iago Toral Quiroga fbac8b1f94 i965/disasm: also print nibctrl in IVB for execsize=8
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.

v2:
- Refactor NibCtrl printing (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:06 -07:00
Jason Ekstrand 220974b38d anv/blorp: Properly handle VK_ATTACHMENT_UNUSED
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description.  The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere.  This commit should
fix all of them.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand 21d2ca72d8 anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand 02eca8b6f8 anv/cmd_buffer: Always set up a null surface state
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand e1f6fb8021 anv/cmd_buffer: Flush the VF cache at the top of all primaries
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-14 13:35:02 -07:00
Jason Ekstrand 939337e49f anv/blorp: Flush the texture cache in UpdateBuffer
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-14 13:35:02 -07:00
Jason Ekstrand 475bab0330 anv: Limit VkDeviceMemory objects to 2GB
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-04-14 13:35:02 -07:00
Jason Ekstrand 4495b917e2 intel/blorp: Add a blorp_emit_dynamic macro
This makes it much easier to throw together a bit of dynamic state.  It
also automatically handles flushing so you don't accidentally forget.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-04-14 13:35:02 -07:00
Matt Turner ab18578b03 anv: Only define wsi_cbs when VK_USE_PLATFORM_WAYLAND_KHR defined 2017-04-12 11:00:39 -07:00
Francisco Jerez 147e71242c i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block.  Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.

Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e.  Should have a comparable effect on other platforms.  No
significant regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-11 15:28:54 -07:00
Juan A. Suarez Romero 8d7a82ae32 anv: remove needless VALGRIND_MAKE_MEM_DEFINED
This is already invoked in the following VG_NOACCESS_READ() call.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-11 17:21:57 +02:00
Jason Ekstrand da2ac19511 intel/blorp: Use ISL for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand d3785dcb2f intel/blorp: Emit 3DSTATE_STENCIL_BUFFER before HIER_DEPTH
We're about to replace blorp's emit code with ISL and it emits them in
the other order.  This makes diffing the aubs easier.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand f93dc5beee anv: Use ISL for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand bf95f7c209 intel/isl: Add support for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand 098ca9949d intel/isl: Use genx_bits.h instead of a hand-rolled table
This gets rid of one piece of ugliness with the way ISL handles surface
emitting surface states.  I've never liked that hand-rolled table but it
was the best we had at the time.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand b85d75b3e8 intel/genxml/bits: Emit per-container _length helpers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand f97e251ab2 intel/genxml/bits: Emit per-field _start helpers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand 430e697868 intel/genxml/bits: Pull the function emit code into a helper block
The helper block is extremely general.  It takes an string property name
and an object that supports three methods: has_prop, iter_prop, and
get_prop.  This way we can easily generalize it to emit more different
types of getter functions.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand 2d52e65d03 intel/genxml/bits: Refactor to add a container class
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand bc68aa42bd anv: Use subpass dependencies for flushes
Instead of figuring it all out ourselves, just use the information given
to us by the client.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand e5bbf8be36 anv/pass: Record required pipe flushes
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand 0039d0cf27 anv/pass: Use anv_multialloc for allocating the anv_pass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand 415633a722 anv/descriptor_set: Use anv_multialloc for descriptor set layouts
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand e5c29b8c27 anv: Add a helper for doing mass allocations
We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure.  While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier.  This commit
adds a nice little helper struct for getting rid of some of that
complexity.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand 82695c32b6 anv: Add helpers for converting access flags to pipe bits
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand 4e17b59f6c anv/query: Use snooping on !LLC platforms
Commit b2c97bc789 which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms.  For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced.  More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again.  Getting the clflushing both correct
and efficient is very subtle and painful.  Instead, let's side-step the
problem by just snooping.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-07 12:17:20 -07:00
Emil Velikov 5318d1ff94 anv: provide anv_gem_busy() stub for the tests
Otherwise linking way fail.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600
Fixes: f195d40eca ("anv/device: Add a helper for querying whether a BO is busy")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2017-04-07 19:45:58 +01:00
Samuel Iglesias Gonsálvez 1c934bc71b anv/blorp: sample input attachments with resolves on BDW
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.

Fixes 16 tests on BDW:

dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-07 07:49:43 +02:00
Jordan Justen 0370350d11 intel/aubinator: Stop searching after a custom handler is found
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen d5bd0e411e intel/gen_decoder: return -1 for unknown command formats
Decoding with aubinator encountered a command of 0xffffffff. With the
previous code, it caused aubinator to jump 255 + 2 dwords to start
decoding again.

Instead we can attempt to detect the known instruction formats. If the
format is not recognized, then we can advance just 1 dword.

v2:
 * Update aubinator_error_decode
 * Actually convert the length variable returned into a *signed* integer
   in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen 7c33372f82 intel/gen_decoder: Fix length for Media State/Object commands
From BDW PRM, Volume 6: Command Stream Programming, 'Render Command
Header Format'.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen 3c77a57222 intel/aubinator_error_decode: Fix structure decode data
The call to gen_print_group should provide a pointer to the beginning
of the the structure data, not the start of the batch data.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:25:38 -07:00
Jason Ekstrand b2c97bc789 anv/query: Busy-wait for available query entries
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO.  Some clients, such as the Serious engine,
use a single query pool for hundreds of individual query results where
the writes for those queries may be split across several command
buffers.  In this scenario, the individual query we're looking for may
become available long before the BO is idle so waiting on the query pool
BO to be finished is wasteful. This commit makes us instead busy-loop on
each query until it's available.

This significantly reduces pipeline bubbles and improves performance of
The Talos Principle on medium settings (where the GPU isn't overloaded
with drawing) by around 20% on my SkyLake gt4.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
2017-04-05 21:17:11 -07:00
Jason Ekstrand f195d40eca anv/device: Add a helper for querying whether a BO is busy
This is a bit more efficient than using GEM_WAIT with a timeout of 0.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-05 21:17:11 -07:00
Emil Velikov a6840efc09 anv: provide required gem stubs for the tests
Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones.
Otherwise we may get link-time errors, when building the tests.

v2: Introduce all the missing stubs at once.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574
Fixes: c964f0e485 ("anv: Query the kernel for reset status")
Fixes: 651ec926fc ("anv: Add support for 48-bit addresses")
Fixes: 060a6434ec ("anv: Advertise larger heap sizes")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
I've intentionally kept the order the same identical to the anv_gem.c.
This way we can easily grep & diff in the future ;-)
2017-04-05 17:54:38 +01:00
Emil Velikov e664cfc5a7 intel: genxml: automake: include gen_bits_header.py in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 13:16:28 +01:00
Emil Velikov e180680980 intel: genxml: automake: polish automake rules
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 13:16:28 +01:00
Jason Ekstrand 060a6434ec anv: Advertise larger heap sizes
Instead of just advertising the aperture size, we do something more
intelligent.  On systems with a full 48-bit PPGTT, we can address 100%
of the available system RAM from the GPU.  In order to keep clients from
burning 100% of your available RAM for graphics resources, we have a
nice little heuristic (which has received exactly zero tuning) to keep
things under a reasonable level of control.

Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand 651ec926fc anv: Add support for 48-bit addresses
This commit adds support for using the full 48-bit address space on
Broadwell and newer hardware.  Thanks to certain limitations, not all
objects can be placed above the 32-bit boundary.  In particular, general
and state base address need to live within 32 bits.  (See also
Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.)  In order
to handle this, we add a supports_48bit_address field to anv_bo and only
set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set.  We set the bit
for all client-allocated memory objects but leave it false for
driver-allocated objects.  While this is more conservative than needed,
all driver allocations should easily fit in the first 32 bits of address
space and keeps things simple because we don't have to think about
whether or not any given one of our allocation data structures will be
used in a 48-bit-unsafe way.

Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand 439da38d18 anv: Replace anv_bo::is_winsys_bo with a uint32_t flags
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand 5d1ba2cb04 anv/blorp: Align vertex buffers to 64B
This fixes issues seen when adding support for full 48-bit addresses.
The 48-bit addresses themselves have nothing to do with it other than
that it caused the kernel to place buffers slightly differently so they
interacted differently with the caches.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:52 -07:00
Jason Ekstrand c964f0e485 anv: Query the kernel for reset status
When a client causes a GPU hang (or experiences issues due to a hang in
another client) we want to let it know as soon as possible.  In
particular, if it submits work with a fence and calls vkWaitForFences or
vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be
able to trust the results of that rendering.  In order to provide this
guarantee, we have to ask the kernel for context status in a few key
locations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-04 18:33:52 -07:00
Jason Ekstrand 82573d0f75 anv: Check for device loss at the end of WaitForFences
It's possible that the device could have been lost while we were
waiting.  We should let the user know if this has happened.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-04 18:33:51 -07:00
Jason Ekstrand c6f69eea6a anv/pipeline: Properly handle unset gl_Layer and gl_ViewportIndex
When the shader does not set one of these values, they are supposed to
get a default value of 0.  We have hardware bits in 3DSTATE_CLIP for
this but haven't been setting them.  This fixes the intermittent failure
of dEQP-VK.geometry.layered.3d.render_to_default_layer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:51 -07:00
Jason Ekstrand 3503b2714b i965/fs: Always provide a default LOD of 0 for TXS and TXL
We already provide a default LOD for textureQueryLevels and texture() on
non-fragment stages.  However, there are more cases where one is needed
such as textureSize(gsampler2DMS*) in SPIR-V.  Instead of trying to list
out all of the cases one at a time, just provide the default for all TXS
and TXL operations.  This fixes a shader validation error in the new
Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:35 -07:00
Jason Ekstrand 1fde054b8f intel/isl: Refactor and clerify gen8 alignment calculations
Adding the actual table from the docs makes it clearer exactly what the
restrictions are.  In particular, it becomes clear that compressed
textures ignore the alignment parameters in RENDER_SURFACE_STATE.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-04 14:51:57 -07:00
Lionel Landwerlin e8d9b76f63 intel: tools: add aubinator_error_decode tool
This is pretty much the same tool as what i-g-t has, only with a more
fancy decoding of the instructions/registers. It also doesn't support
anything before gen4.

v2 (from Matt): Drop authors
                Remove undefined automake variable

v3: Fix incorrect offsets for dword > 1 (Jordan)

v4: Fix decompression error with large blobs (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin 567d77885e intel: genxml: add RING_BUFFER_CTL registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin 6f260ff049 intel: genxml: add FAULT_REG register
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin ca2771fa18 intel: genxml: add gen7 ERR_INT register
v2: add register to gen7.5 (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin 84613bf6d5 intel: genxml: add ACTHD registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin 0f195f22aa intel: genxml: add GFX_ARB_ERROR_RPT register
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin d1a7a54d77 intel: genxml: add INSTDONE registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Mauro Rossi 72175bd2a5 android: intel: genxml: fix genX_xml.h generation rules
Recent changes in Makefile.sources merged the aubinator files in
a unique list of generated files and genxml/genX_xml.h is now needed
to avoid the following building error:

ninja: error: '.../genxml/genX_xml.h', needed by '.../genxml/genX_xml.h',
missing and no known rule to make it
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed

Fixes: 0f83c05 "intel: genxml: compress all gen files into one"
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-04 09:10:46 +03:00
Jason Ekstrand 405ef7bb33 intel/vec4: Add some fall through comments
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-03 16:58:35 -07:00
Jason Ekstrand 0817110969 anv: Implement VK_KHR_incremental_present
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-04-03 13:51:08 -07:00
Jason Ekstrand f82b6c6272 vulkan/wsi: Plumb present regions through the common code
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-04-03 13:51:08 -07:00
Lionel Landwerlin 471c1bc7cc aubinator/gen_decoder/i965: decode instructions from dword 0
Some packets like 3DSTATE_VF_STATISTICS, 3DSTATE_DRAWING_RECTANGLE,
3DPRIMITIVE, PIPELINE_SELECT, etc... have configurable fields in
dword0, we probably want to print those.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-03 20:45:34 +01:00
Lionel Landwerlin 04f2e80257 intel: gen_decoder: store pointer to current decoded field in iterator
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-03 20:45:34 +01:00
Lionel Landwerlin 74a80d579d intel: genxml: fix out of tree builds
v2: use Emil's recommendation
    change rule to closer to genxml/genX_bits.h

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-31 15:29:57 +01:00
Tapani Pälli 3535b87a1a anv: change BLOCK_POOL_MEMFD_SIZE to 1GB
This allows us to run 32bit Vulkan apps on Android, ftruncate
call would fail on 2GB (max size being 2GB - 1).

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-31 08:43:28 +03:00
Tapani Pälli 2398770c87 android: add libmesa_genxml as dep to libmesa_isl
This is to fix following compile error with libmesa_isl:
   mesa/src/intel/isl/isl.c:28:10: fatal error: 'genxml/genX_bits.h' file not found

Fixes: f0eaf38 ("genxml: New generated header genX_bits.h (v6)")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emli Velikov <emil.velikov@collabora.com>
2017-03-31 08:42:54 +03:00
Lionel Landwerlin 469da094e1 aubinator: enable snb/ilk through --gen
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-03-31 01:25:33 +01:00
Lionel Landwerlin 0f83c05149 intel: genxml: compress all gen files into one
Combining all the files into a single string didn't make any
difference in the size of the aubinator binary.

With this change we now also embed gen4/4.5/5 descriptions, which
increases the aubinator size by ~16Kb.

v2 (Lionel): rebase makefiles

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-03-31 01:24:56 +01:00
Kenneth Graunke e113dfabad intel: Add INTEL_CFLAGS to aubinator CFLAGS.
It still needs intel_aub.h.  Fixes the build.
2017-03-30 11:58:00 -07:00
Emil Velikov 3df993e1a2 intel: automake: move INTEL_CFLAGS as applicable
Only common/decoder.[ch] requires it [for intel_aub.h].

v2: The code was moved to from intel/tools to intel/common,
update accordingly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-30 19:07:28 +01:00
Emil Velikov 4ffb394961 intel: android: remove libdrm_intel requirement
The only part which requires libdrm_intel tools/aubinator is not built
on Android.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-30 19:07:23 +01:00
Craig Stout 1da7a11de8 anv/cmd_buffer: fix host memory leak
push_constants must be free'd.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100452
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-29 14:32:32 -07:00
Jason Ekstrand 9aba81b160 anv/batch_chain: Handle another OOM in cmd_buffer_execbuf
Found by inspection while rebasing other patches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-29 09:39:49 -07:00
Alejandro Piñeiro 2f8d6bd578 i965: expose BRW_OPCODE_[F32TO16/F16TO32] name on gen8+
Technically those hw operations are only available on gen7, as gen8+
support the conversion on the MOV. But, when using the builder to
implement nir operations (example: nir_op_fquantize2f16), it is not
needed to do the gen check. This check is done later, on the final
emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the
specific operation accordingly.

So in the middle, during optimization phases those hw operations can
be around for gen8+ too.

Without this patch, several (at least 95) vulkan-cts quantize tests
crashes when using INTEL_DEBUG=optimizer. For example:
dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert

v2: simplify the code using GEN_GE (Ilia Mirkin)
v3: tweak brw_instruction_name instead of changing opcode_descs
    table, that is used for validation (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-29 17:34:15 +02:00
Jason Ekstrand f3673db3d6 anv/cmd_buffer: Refactor flush_pipeline_select_*
While having the _3d and _gpgpu versions is nice, there's no reason why
we need to have duplicated logic for tracking the current pipeline.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-03-28 14:57:09 -07:00
Jason Ekstrand 6baae9625d anv: Flush caches prior to PIPELINE_SELECT on all gens
The programming note that says we need to do this still exists in the
SkyLake PRM and, from looking at the bspec, seems like it may apply to
all hardware generations SNB+.  Unfortunately, this isn't particularly
clear cut since there is also language in the bspec that says you can
skip the flushing and stall to get better throughput.  Experimentation
with the "Car Chase" benchmark in GL seems to indicate that some form of
flushing is still needed.  This commit makes us do the full set of
flushes regardless of hardware generation.  We can always reduce the
flushing later.

Reported-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:57:08 -07:00
Jason Ekstrand 0fe3dcce4c anv/cmd_buffer: Fix bad indentation
A bunch of code was indented in such a way that it looked like it went
with the if statement above but it definitely didn't.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:57:06 -07:00
Jason Ekstrand 01a65dc43b anv/cmd_buffer: Apply flush operations prior to executing secondaries
This fixes rendering issues in the Vulkan port of skia on some hardware.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:56:55 -07:00
Jason Ekstrand 9319ef96fd anv/blorp: Use anv_get_layerCount everywhere
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:41:48 -07:00
Jason Ekstrand 1b8fa8dd79 anv: Make anv_get_layerCount a macro
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:41:47 -07:00
Chad Versace d1032a047b isl: Drop unused isl_surf_init_info::min_pitch
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-28 09:44:44 -07:00
Chad Versace 6cbc13d94c intel: Fix requests for exact surface row pitch (v2)
All callers of isl_surf_init() that set 'min_row_pitch' wanted to
request an *exact* row pitch, as evidenced by nearby asserts, but isl
lacked API for doing so. Now that isl has an API for that, update the
code to use it.

v2: Assert that isl_surf_init() succeeds because the callers assume
    it.  [for jekstrand]

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2017-03-28 09:44:44 -07:00
Chad Versace e9017d58dc isl: Let isl_surf_init's caller set the exact row pitch (v2)
The caller does so by setting the new field
isl_surf_init_info::row_pitch.

v2: Validate the requested row_pitch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2017-03-28 09:44:44 -07:00
Chad Versace 23802dafc2 isl: Validate the calculated row pitch (v45)
Validate that isl_surf::row_pitch fits in the below bitfields,
if applicable based on isl_surf::usage.

    RENDER_SURFACE_STATE::SurfacePitch
    RENDER_SURFACE_STATE::AuxiliarySurfacePitch
    3DSTATE_DEPTH_BUFFER::SurfacePitch
    3DSTATE_HIER_DEPTH_BUFFER::SurfacePitch

v2:
  -Add a Makefile dependency on generated header genX_bits.h.
v3:
  - Test ISL_SURF_USAGE_STORAGE_BIT too. [for jekstrand]
  - Drop explicity dependency on generated header. [for emil]
v4:
  - Rebase for new gen_bits_header.py script.
  - Replace gen_10x with gen_device_info*.
v5:
  - Drop FINISHME for validation of GEN9 1D row pitch. [for jekstrand]
  - Reformat bit tests. [for jekstrand]

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v4)
2017-03-28 09:44:44 -07:00
Chad Versace f0eaf38db2 genxml: New generated header genX_bits.h (v6)
genX_bits.h contains the sizes of bitfields in genxml instructions,
structures, and registers. It also defines some functions to query those
sizes.

isl_surf_init() will use the new header to validate that requested
pitches fit in their destination bitfields.

What's currently in genX_bits.h:

  - Each CONTAINER::Field from gen*.xml that has a bitsize has a macro
    in genX_bits.h:

        #define GEN{N}_CONTAINER_Field_bits {bitsize}

  - For each set of macros whose name, after stripping the GEN prefix,
    is the same, genX_bits.h contains a query function:

      static inline uint32_t __attribute__((pure))
      CONTAINER_Field_bits(const struct gen_device_info *devinfo);

v2 (Chad Versace):
  - Parse the XML instead of scraping the generated gen*_pack.h headers.

v3 (Dylan Baker):
  - Port to Mako.

v4 (Jason Ekstrand):
  - Make the _bits functions take a gen_device_info.

v5 (Chad Versace):
  - Fix autotools out-of-tree build.
  - Fix Android build. Tested with git://github.com/android-ia/manifest.
  - Fix macro names. They were all missing the "_bits" suffix.
  - Fix macros names more. Remove all double-underscores.
  - Unindent all generated code. (It was floating in a sea of whitespace).
  - Reformat header to appear human-written not machine-generated.
  - Sort gens from high to low. Newest gens should come first because,
    when we read code, we likely want to read the gen8/9 code and ignore
    the gen4 code. So put the gen4 code at the bottom.
  - Replace 'const' attributes with 'pure', because the functions now
    have a pointer parameter.
  - Add --cpp-guard flag. Used by Android.
  - Kill class FieldCollection. After Jason's rewrite, it was just
    a dict.

v6 (Chad Versace):
  - Replace `key not in d.keys()` with `key not in d`. [for dylan]

Co-authored-by: Dylan Baker <dylan@pnwbakers.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v5)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v6)
2017-03-28 09:44:44 -07:00
Matt Turner 7dccd38b40 i965/fs: Don't emit SEL instructions for type-converting MOVs.
SEL can only convert between a few integer types, which we basically
never do.

Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-03-27 10:59:42 -07:00
Xu Randy 004468de14 anv/blorp: Fix a crash in CmdClearColorImage
We should use anv_get_layerCount() to access layerCount of VkImageSub-
resourceRange in anv_CmdClearColorImage and anv_CmdClearDepthStencil-
Image, which handles the VK_REMAINING_ARRAY_LAYERS (~0) case.

Test: Sample multithreadcmdbuf from LunarG can run without crash

Signed-off-by: Xu Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-27 07:43:17 -07:00
Samuel Iglesias Gonsálvez c4c02471f4 anv: enable sampling from fast-cleared images on SKL
A resolve is not needed on Skylake in this case. We were forcing
a resolve because we set the input_aux_usage to ISL_AUX_USAGE_NONE.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-03-27 06:32:24 +02:00
Chad Versace 7414326164 genxml: Add 3DSTATE_DEPTH_BUFFER to gen5.xml
isl will use this for validating the depth buffer pitch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 19:07:05 -07:00
Jason Ekstrand e6621746dc genxml: Whitespace fixes
Some field names had extra spaces and some had places where we should
have had a space but didn't.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Jason Ekstrand 34c3f6a27f genxml: Replace "[N]" with "N"
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Jason Ekstrand c2af555d6e genxml/gen6: Remove a couple of bogus values
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Jason Ekstrand ec27402a8f genxml/gen8: Remove BLACK_LEVEL_CORRECTION_STATE
We've never used it, it only exists on gen8, and the name of the struct
contains piles of bad characters.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Jason Ekstrand a6df637d26 genxml: Rename two MCS fields to Auxiliary Surface on gen7
This makes gen7 more consistent with gen8+

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Chad Versace b3f81e06d4 genxml: Fix gen_zipped_file.py dependency
The gen*_xml.h files depend on gen_zipped_file.py, not the gen*_pack.h
files.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-24 14:38:22 -07:00
Chad Versace c7c6c53adb genxml: Define GENXML_XML_FILES in Makefile.sources
The future header genX_bits.h will depend on GENXML_XML_FILES.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-24 14:38:15 -07:00
Emil Velikov 15603055fb anv: automake: ensure that the destination directory is created
Earlier commit unintentionally dropped the mkdir, as it was rebased.

Some versions of autotools will not create the output directory for
generated sources. Thus the issue went unnoticed by the original author.

Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Steven Newbury <steve@snewbury.org.uk>
Reported-by: Steven Newbury <steve@snewbury.org.uk> Fixes:
Fixes: 1610b3dede ("anv: don't pass xmlfile via stdin anv_entrypoints_gen.py")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-24 12:02:04 +00:00
Iago Toral Quiroga 129fd58131 anv/query: handle out of host memory without crashing in compute_query_result()
We don't need to make the caller (CmdCopyQueryPoolResults) aware of the
problem since compute_query_result() only emits state. The caller is also
expected to hit OOM in this scenario right after calling this function, but
it is already handling it safely.

Fixes:
dEQP-VK.api.out_of_host_memory.cmd_copy_query_pool_results

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-24 09:39:44 +01:00
Iago Toral Quiroga ddb2bb3ed4 anv/pipeline: make FragCoord include sample positions when sample shading
We need to know if sample shading has been requested during shader
compilation since that affects the way fragment coordinates are
computed.

Notice that the semantics of fragment coordinates only depend on
whether sample shading has been requested, not on whether more
than one sample will actually be produced (that is,
minSampleShading and rasterizationSamples do not affect this
behavior).

Because this setting affects the code we generate for the shader, we also
need to include it in the WM prog key. Notice we don't need to alter the
OpenGL code because it doesn't ever use this behavior, so they key's
value is always false (the default).

Fixes:
dEQP-VK.glsl.builtin_var.fragcoord_msaa.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Iago Toral Quiroga 023ea3772d nir/lower_wpos_center: support adding sample position to fragment coordinate
According to section 14.6 of the Vulkan specification:

   "When sample shading is enabled, the x and y components of FragCoord
    reflect the location of the sample corresponding to the shader
    invocation."

So add a boolean parameter to the lowering pass to select this behavior
when we need it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Iago Toral Quiroga 4da1832c00 anv: return VK_ERROR_DEVICE_LOST immeditely when device is known to be lost
If we know the device has been lost we should return this error code for
any command that can report it before we attempt to do anything with the
device.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Iago Toral Quiroga 50c8d2c1f7 anv/device: keep track of 'device lost' state
The Vulkan specs say:

   "A logical device may become lost because of hardware errors, execution
    timeouts, power management events and/or platform-specific events. This
    may cause pending and future command execution to fail and cause hardware
    resources to be corrupted. When this happens, certain commands will
    return VK_ERROR_DEVICE_LOST (see Error Codes for a list of such commands).
    After any such event, the logical device is considered lost. It is not
    possible to reset the logical device to a non-lost state, however the lost
    state is specific to a logical device (VkDevice), and the corresponding
    physical device (VkPhysicalDevice) may be otherwise unaffected. In some
    cases, the physical device may also be lost, and attempting to create a
    new logical device will fail, returning VK_ERROR_DEVICE_LOST."

This means that we need to track if a logical device has been lost so we can
have the commands referenced by the spec return VK_ERROR_DEVICE_LOST
immediately.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Iago Toral Quiroga 70194c9f1a anv/device: return VK_ERROR_DEVICE_LOST for errors during queue submissions
So that we don't have to do things like rolling back address relocations in
case that we ran into OOM after computing them, etc

Also, make sure that if the queue submission comes with a fence, we set it up
correctly so it behaves according to the spec after returning
VK_ERROR_DEVICE_LOST.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Matt Turner 7499bc7fd7 i965: Replace OPT_V() with OPT().
We want to be able to check the progress of each pass and dump the NIR
for debugging purposes if it changed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner 1be91bd9d8 i965/fs: Return progress from demote_sample_qualifiers().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner fd3351246c i965/fs: Return progress from move_interpolation_to_top().
And mark as static at the same time.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Tapani Pälli 4f69573178 intel: move gen_decoder.* to DECODER_FILES
patch adds DECODER_FILES for libintel_common, this is so that platforms
such as Android not currently using this functionality can opt out.

Fixes: 7d84bb3 ("intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-23 14:05:19 +02:00
Tapani Pälli bcae4eb502 android: fix vulkan build issues with anv_entrypoints
Patch fixes entrypoint generation for libmesa_anv_entrypoints that
still used old style of calling generator script.

Also small fixes to libmesa_vulkan_common where there was a typo
in target name (vulknan) and files were generated to wrong folder.

Fixes: 8211e3e6 ("anv: Generate anv_entrypoints header and code in one command")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-23 14:04:44 +02:00
Tapani Pälli dc9ebc6ef1 android: rename Intel Vulkan library to match desktop one
Original naming was following Vulkan HAL naming scheme for no good
purpose and we need same binary name for build-id code.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-23 08:19:16 +02:00
Dylan Baker 4ee675d537 anv: Remove dead prototype from entrypoints
Spotted by Emil.

v2: - Add this patch

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker 860beb99a6 anv: use cElementTree in anv_entrypoints_gen.py
It's written in C rather than pure python and is strictly faster, the
only reason not to use it that it's classes cannot be subclassed.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker 9050138af7 anv: don't use Element.get in anv_entrypoints_gen.py
This has the potential to mask errors, since Element.get works like
dict.get, returning None if the element isn't found. I think the reason
that Element.get was used is that vulkan has one extension that isn't
really an extension, and thus is missing the 'protect' field.

This patch changes the behavior slightly by replacing get with explicit
lookup in the Element.attrib dictionary, and using xpath to only iterate
over extensions with a "protect" attribute.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker 4d4697f868 anv: use dict.get in anv_entrypoints_gen.py
Instead of using an if and a check, use dict.get, which does the same
thing, but more succinctly.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00