Commit Graph

2842 Commits

Author SHA1 Message Date
Jason Ekstrand ab7543b13d anv/cmd_buffer: Generalize transition_color_buffer
This moves it to being based on layout_to_aux_usage instead of being
hard-coded based on bits of a priori knowledge of how transitions
interact with layouts.  This conceptually simplifies things because
we're now using layout_to_aux_usage and layout_supports_fast_clear to
make resolve decisions so changes to those functions will do what one
expects.

There is a potential bug with window system integration on gen9+ where
we wouldn't do a resolve when transitioning to the PRESENT_SRC layout
because we just assume that everything that handles CCS_E can handle it
all the time.  When handing a CCS_E image off to the window system, we
may need to do a full resolve if the window system does not support the
CCS_E modifier.  The only reason why this hasn't been a problem yet is
because we don't support modifiers in Vulkan WSI and so we always get X
tiling which implies no CCS on gen9+.  This patch doesn't actually fix
that bug yet but it takes us the first step in that direction by making
us actually pick the correct resolve op.  In order to handle all of the
cases, we need more detailed aux tracking.

v2 (Jason Ekstrand):
 - Make a few more things const
 - Use the anv_fast_clear_support enum

v3 (Jason Ekstrand):
 - Move an assert and add a better comment

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Jason Ekstrand 151771b390 anv/cmd_buffer: Recurse in transition_color_buffer instead of falling through
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Jason Ekstrand bea7373c92 anv/image: Support color aspects in layout_to_aux_usage
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Jason Ekstrand b09464db42 anv/image: Add a helper for determining when fast clears are supported
v2 (Jason Ekstrand):
 - Return an enum instead of a boolean

v3 (Jason Ekstrand):
 - Return ANV_FAST_CLEAR_NONE instead of false (Topi)
 - Rename ANV_FAST_CLEAR_ANY to ANV_FAST_CLEAR_DEFAULT_VALUE
 - Add documentation for the enum values

v4 (Jason Ekstrand):
 - Remove a dead comment

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Jason Ekstrand 1f7eee6bc1 anv/image: Update a comment
This got lost in all of the aspect vs. plane rebasing of YCBCR.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Jason Ekstrand 5c38ab8f07 anv/blorp: Rework HiZ ops to look like MCS and CCS
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Jason Ekstrand 1d473e26f2 anv/blorp: Support ISL_AUX_USAGE_HIZ in surf_for_anv_image
If the function gets passed ANV_AUX_USAGE_DEFAULT, it still has the old
behavior of setting ISL_AUX_USAGE_NONE for depth/stencil which is what
we want for blits/copies.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Jason Ekstrand 42f1668a54 anv/blorp: Rework image clear/resolve helpers
This replaces image_fast_clear and ccs_resolve with two new helpers that
simply perform an isl_aux_op whatever that may be on CCS or MCS.  This
is a bit cleaner as it separates performing the aux operation from which
blorp helper we have to call to do it.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Jason Ekstrand 482c24783e intel/isl: Codify AUX operations in an enum
Right now, we have different entrypoints and enums in blorp for these
different operations.  This provides us a central enum which we can
begin to transition to.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-02-08 16:35:31 -08:00
Scott D Phillips 1f4d2433e7 meson: Add build option for tools
Add a build option to control building some of the misc tools we
have. Also set the executables to install, presumably you want
that if you're asking for the build.

v2: set 'install:' to the with_tools value, not true (Jordan)
    handle 'all' in a the comma list (Dylan)
    Add freedreno's tools (Dylan)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-02-08 11:24:42 -08:00
Timothy Arceri ffeebcfa7e i965: remove unused brw_nir_lower_cs_shared()
This has been unused since 8761a04d0d.

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-02-07 08:38:01 +11:00
Iago Toral Quiroga a5053ba27e anv/device: initialize the list of enabled extensions properly
The loop goes through the list of enabled extensions marking them as
enabled in the list, but this relies on every other extension being
initialized to false by default.

This bug would make us, for example, advertise certain device extension
entry points as available even when the corresponding extensions had
not been enabled.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: abc62282b5 "anv: Add a per-device table of enabled extensions"
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-02-06 07:51:00 +01:00
Iago Toral Quiroga 1d20001d97 i965/nir: do int64 lowering before optimization
Otherwise loop unrolling will fail to see the actual cost of
the unrolling operations when the loop body contains 64-bit integer
instructions, and very specially when the divmod64 lowering applies,
since its lowering is quite expensive.

Without this change, some in-development CTS tests for int64
get stuck forever trying to register allocate a shader with
over 50K SSA values. The large number of SSA values is the result
of NIR first unrolling multiple seemingly simple loops that involve
int64 instructions, only to then lower these instructions to produce
a massive pile of code (due to the divmod64 lowering in the unrolled
instructions).

With this change, loop unrolling will see the loops with the int64
code already lowered and will realize that it is too expensive to
unroll.

v2: Run nir_algebraic first so we can hopefully get rid of some of
    the int64 instructions before we even attempt to lower them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-02-06 07:49:27 +01:00
Matt Turner e2b31e9acf i965: Move mistakenly placed line
Ken called this out in review, but it seems I forgot to make the change.
I noticed that the control flow annotations in the fragment shader
disassembly of tests/shaders/glsl-fs-loop-continue.shader_test were not
correct, and moving this line to the correct place fixes it.
2018-02-05 09:50:56 -08:00
Jason Ekstrand 589e9db23f aubinator: Multiply count by 4 to compute buffer sizes
The count field is in terms of dwords and not bytes.  In
7d4007d58a, I fixed one instance
of this but missed another.
2018-02-02 22:30:56 -08:00
Kenneth Graunke 85ec7abc3f intel/decoder: Fix control / evaluation label mixup.
Trivial.  DS is TES, HS is TCS.
2018-02-01 09:44:15 -08:00
Jason Ekstrand 97938dac36 anv/cmd_buffer: Re-emit the pipeline at every subpass
If we ever hit this edge-case, it can theoretically cause problem for
CNL because we could end up changing render targets without re-emitting
3DSTATE_MULTISAMPLE which is part of the pipeline.  Just get rid of the
edge case.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-01-30 17:16:33 -08:00
Iago Toral Quiroga 99b57daf4a anv/pipeline: lower constant initializers on output variables earlier
If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.

Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_vert
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_frag

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-30 08:10:29 +01:00
Rafael Antognolli fa21ddf7b1 anv/cmd_buffer: Emit PIPE_CONTROL with ISP bit on older platforms.
Emit it on all platforms since gen7.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-29 14:52:07 -08:00
Timothy Arceri 5b8de4bdff nir: add vs_inputs_dual_locations compiler option
Allows nir drivers to either use a single or dual locations for
vs double inputs.

i965 uses dual locations for both OpenGL and Vulkan drivers, for
now gallium OpenGL drivers only use a single location.

The following patch will also make use of this option when
calling nir_shader_gather_info().

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-01-30 09:08:47 +11:00
Timothy Arceri f63e05ae9e compiler: tidy up double_inputs_read uses
First we move double_inputs_read into a vs struct in the union,
double_inputs_read is only used for vs inputs so this will
save space and also allows us to add a new double_inputs field.

We add the new field because c2acf97fcc changed the behaviour
of double_inputs_read, and while it's no longer used to track
actual reads in i965 we do still want to track this for gallium
drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 09:08:47 +11:00
Rafael Antognolli 20578f81a6 anv/gen10: Emit CS stall and mark push constants dirty.
I got reviews and fixed the patches locally, but ended up merging the
ones that I sent originally to the list. This patch fixes those
mistakes.

Fixes: 78c125af39
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-26 11:59:17 -08:00
Rafael Antognolli bcfd78e448 i965/gen10: Re-enable push constants.
The GPU hang caused by push constants is apparently fixed, so let's
enable them again.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-26 10:07:44 -08:00
Rafael Antognolli 78c125af39 anv/gen10: Ignore push constant packets during context restore.
Similar to the GL driver, ignore 3DSTATE_CONSTANT_* packets when doing a
context restore.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-26 10:07:40 -08:00
Iago Toral Quiroga d3ce493b34 anv/pipeline: remove the pipeline layout field from anv_pipeline
It no longer has any users.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-26 14:06:47 +01:00
Iago Toral Quiroga 75a4802060 anv/cmd_buffer: add the pipeline layout to the pipeline state
We need to access the pipeline layout to compute correct dynamic
offsets for dyamic UBO/SSBO descriptors when we emit draw commands.
Instead of taking it from the pipeline object, store the layout
in the command buffer pipeline state.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-26 14:06:47 +01:00
Iago Toral Quiroga e1a49f974b anv/pipeline: don't take the layout from the pipeline to compile shaders
The Vulkan spec states that VkPipelineLayout objects must not be
destroyed while any command buffer that uses them is in the recording
state, but it permits them to be destroyed otherwise. This means that
applications are allowed to free pipeline layouts after command recording
is finished even if there are pipeline objects that still exist and were
created with these layouts.

There are two solutions to this, one is to use reference counting on
pipeline layout objects. The other is to avoid holding references to
pipeline layouts where they are not really needed.

This patch takes a step towards the second option by making the
pipeline shader compile code take pipeline layout from the
VkGraphicsPipelineCreateInfo provided rather than the pipeline
object.

A follow-up patch will remove any remaining uses of the layout field
so we can remove it from the pipeline object and avoid the need
for reference counting.

v2: Use ANV_FROM_HANDLE, remove unnecessary braces (Jason)

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-26 14:06:46 +01:00
Iago Toral Quiroga 14f6275c92 anv/descriptor_set: add reference counting for descriptor set layouts
The spec states that descriptor set layouts can be destroyed almost
at any time:

   "VkDescriptorSetLayout objects may be accessed by commands that
    operate on descriptor sets allocated using that layout, and those
    descriptor sets must not be updated with vkUpdateDescriptorSets
    after the descriptor set layout has been destroyed. Otherwise,
    descriptor set layouts can be destroyed any time they are not in
    use by an API command."

v2: allocate off the device allocator with DEVICE scope (Jason)

Fixes the following work-in-progress CTS tests:
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.compute

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-26 14:06:46 +01:00
Jason Ekstrand c8949e2498 anv/pipeline: Don't look at blend state unless we have an attachment
Without this, we may end up dereferencing blend before we check for
binding->index != UINT32_MAX.  However, Vulkan allows the blend state to
be NULL so long as you don't have any color attachments.  This fixes a
segfault when running The Talos Principal.

Fixes: 12f4e00b69
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-01-26 01:44:45 -08:00
Maxin B. John 8116b9170b anv_icd.py: improve reproducible builds
Sort the output to ensure build reproducibility

Signed-off-by: Maxin B. John <maxin.john@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 0ab04ba979 ("anv: Use python to generate ICD json files")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-26 01:37:45 -08:00
Jason Ekstrand db682b8f0e i965/fs: Reset the register file to VGRF in lower_integer_multiplication
18fde36ced changed the way temporary
registers were allocated in lower_integer_multiplication so that we
allocate regs_written(inst) space and keep the stride of the original
destination register.  This was to ensure that any MUL which originally
followed the CHV/BXT integer multiply regioning restrictions would
continue to follow those restrictions even after lowering.  This works
fine except that I forgot to reset the register file to VGRF so, even
though they were assigned a number from alloc.allocate(), they had the
wrong register file.  This caused some GLES 3.0 CTS tests to start
failing on Sandy Bridge due to attempted reads from the MRF:

    ES3-CTS.functional.shaders.precision.int.highp_mul_fragment.snbm64
    ES3-CTS.functional.shaders.precision.int.mediump_mul_fragment.snbm64
    ES3-CTS.functional.shaders.precision.int.lowp_mul_fragment.snbm64
    ES3-CTS.functional.shaders.precision.uint.highp_mul_fragment.snbm64
    ES3-CTS.functional.shaders.precision.uint.mediump_mul_fragment.snbm64
    ES3-CTS.functional.shaders.precision.uint.lowp_mul_fragment.snbm64

This commit remedies this problem by, instead of copying inst->dst and
overwriting nr, just make a new register and set the region to match
inst->dst.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103626
Fixes: 18fde36ced
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-01-25 13:58:55 -08:00
Emil Velikov 50265cd9ee automake: anv: ship anv_extensions_gen.py in the tarball
Fixes: dd088d4bec ("anv/extensions: Generate a header file with
extension tables")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2018-01-25 17:06:29 +00:00
Jason Ekstrand 7d4007d58a aubinator: Multiply count by 4 to compute buffer sizes
The count field is in terms of dwords and not bytes.
2018-01-24 19:05:36 -08:00
Grazvydas Ignotas 0cc7370733 anv: correct a duplicate check in an assert
Looks like checking both sources was intended, instead of the first one
twice. Found with Coccinelle, coccinellery/xand/xand.cocci semantic patch.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-01-25 01:10:45 +02:00
Jason Ekstrand 4064fe59e7 anv/cmd_buffer: Move gen7 index buffer state to graphics state
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:46 -08:00
Jason Ekstrand 38ec78049f anv/cmd_buffer: Move num_workgroups to compute state
While we're here, make it an anv_address.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:44 -08:00
Jason Ekstrand 95ff232294 anv/cmd_buffer: Move dynamic state to graphics state
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:43 -08:00
Jason Ekstrand 24caee8975 anv/cmd_buffer: Use a temporary variable for dynamic state
We were already doing this for some packets to keep the lines shorter.
We may as well just do it for all of them.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:40 -08:00
Jason Ekstrand 8bd5ec5b86 anv/cmd_buffer: Move vb_dirty bits into anv_cmd_graphics_state
Vertex buffers are entirely a graphics pipeline thing.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:39 -08:00
Jason Ekstrand e85aaec148 anv/cmd_buffer: Move dirty bits into anv_cmd_*_state
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:36 -08:00
Jason Ekstrand 97f96610c8 anv: Separate compute and graphics descriptor sets
The Vulkan spec says:

    "pipelineBindPoint is a VkPipelineBindPoint indicating whether the
    descriptors will be used by graphics pipelines or compute pipelines.
    There is a separate set of bind points for each of graphics and
    compute, so binding one does not disturb the other."

Up until now, we've been ignoring the pipeline bind point and had just
one bind point for everything.  This commit separates things out into
separate bind points.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102897
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:33 -08:00
Jason Ekstrand 31b2144c83 anv/cmd_buffer: Use anv_descriptor_for_binding for samplers
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:31 -08:00
Jason Ekstrand b9e1ca16f8 anv/cmd_buffer: Add a helper for binding descriptor sets
This lets us unify some code between push descriptors and regular
descriptors.  It doesn't do much for us yet but it will.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:30 -08:00
Jason Ekstrand 90cceaa9dd anv/cmd_buffer: Refactor ensure_push_descriptor_set
It's now a function which returns the push descriptor set.  Since we set
the error on the command buffer, returning the error is a little
redundant.  Returning the descriptor set (or NULL on error) is more
convenient.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:28 -08:00
Jason Ekstrand d5592e2fda anv: Remove semicolons from vk_error[f] definitions
With the semicolons, they can't be used in a function argument without
throwing syntax errors.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:27 -08:00
Jason Ekstrand 9af5379228 anv/cmd_buffer: Add substructs to anv_cmd_state for graphics and compute
Initially, these just contain the pipeline in a base struct.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:25 -08:00
Jason Ekstrand ddc2d28548 anv/cmd_buffer: Use some pre-existing pipeline temporaries
There are several places where we'd already saved the pipeline off to a
temporary variable but, due to an artifact of history, weren't actually
using that temporary everywhere.  No functional change.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:24 -08:00
Jason Ekstrand cd3feea745 anv/cmd_buffer: Rework anv_cmd_state_reset
This splits anv_cmd_state_reset into separate init and finish functions.
This lets us share init code with cmd_buffer_create.  This potentially
fixes subtle bugs where we may have missed some bit of state that needs
to get initialized on command buffer creation.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:22 -08:00
Jason Ekstrand d6c9a89d13 anv/cmd_buffer: Get rid of the meta query workaround
Meta has been gone for a long time.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:20 -08:00
Jason Ekstrand bc0a21e348 anv/cmd_state: Drop the scratch_size field
This is a legacy left-over from the mechanism we used to use to handle
scratch.  The new (and better) mechanism doesn't use this.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:19 -08:00
Jason Ekstrand 4b69ba3817 anv/pipeline: Don't assert on more than 32 samplers
This prevents an assert when running one unreleased Vulkan game.

Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
2018-01-23 21:10:08 -08:00
Jason Ekstrand de00e8227b anv: Return trampoline entrypoints from GetInstanceProcAddr
Technically, the Vulkan spec requires that we return valid entrypoints
for all core functionality and any available device extensions.  This
means that, for gen-specific functions, we need to return a trampoline
which looks at the device and calls the right device function.  In 99%
of cases, the loader will do this for us but, aparently, we're supposed
to do it too.  It's a tiny increase in binary size for us to carry this
around but really not bad.

Before:
       text    data   bss      dec     hex  filename
    3541775  204112  6136  3752023  394057  libvulkan_intel.so

After:
       text    data   bss      dec     hex  filename
    3551463  205632  6136  3763231  396c1f  libvulkan_intel.so

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand eac29f3a6d anv/entrypoints: Use an named tuple for params
This allows us to store a bit more detailed data per-param

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 1f79d986af anv: Only advertise enabled entrypoints
The Vulkan spec annoyingly requires us to track what core version and
what all extensions are enabled and only advertise those entrypoints.
Any call to vkGet*ProcAddr for an entrypoint for an extension the client
has not explicitly enabled is supposed to return NULL.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand e3d27542ae anv: Add a per-device dispatch table
We also switch GetDeviceProcAddr over to use it.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 0c399dca51 anv: Add a per-instance dispatch table
We also switch GetInstanceProcAddr over to use it.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand a372b9247d anv: Properly NULL for GetInstanceProcAddr with a null instance
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand cb0d1ba156 anv/extensions: Fix VkVersion::c_vk_version for patch == None
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 93e789a266 anv/entrypoints: Parse entrypoints before extensions/features
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 2f493121ae anv/entrypoints: Expose the different dispatch tables
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 083e126694 anv/entrypoints: Split entrypoint index lookup into its own function
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 7039308d7c anv/entrypoints: Add a LAYERS helper variable
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand f54227856f anv/entrypoints: Add an Entrypoint class
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand abc62282b5 anv: Add a per-device table of enabled extensions
Nothing uses this at the moment, but we will need it soon.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 01b9701a5c anv: Use tables for device extension wrangling
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 920bd2c0bc anv: Add a per-instance table of enabled extensions
Nothing needs this yet but we will want it later.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand ff5f3e2b21 anv: Use tables for instance extension wrangling
This lets us move a bunch of stuff out of codegen and back into
anv_device.c which is a bit nicer.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand dd088d4bec anv/extensions: Generate a header file with extension tables
This allows us better introspection into extensions.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand ffb10bfd8e anv/meson: Simplify some dependency and flag tracking
This removes some redundant code between libanv_common, libvulkan_intel,
and libvulkan_intel_test.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand f939940809 anv: Split anv_extensions.py into two files
The new anv_extensions_gen.py is the code generator while the old
anv_extensions.py file is purely declarative.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Jason Ekstrand 10d1b0be8e anv/meson: Make anv_entrypoints_gen.py depend on anv_extensions.py
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Kenneth Graunke 60f15477da i965: Drop render_target_start from binding table struct.
We have to start render targets at binding table index 0 in order to use
headerless FB write messages, and in fact already assume this in a bunch
of places in the code.  Let's finish that off, and not bother storing 0
in a struct to pretend to add it in a few places.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-01-22 10:03:52 -08:00
Dylan Baker 436ed65d38 autotools: include meson build files in tarball
This adds the meson.build, meson_options.txt, and a few scripts that are
used exclusively by the meson build.

v2: - Remove accidentally included changes needed to test make dist with
      LLVM > 3.9

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-01-19 16:30:51 -08:00
Samuel Iglesias Gonsálvez 7109a1fe13 anv: avoid segmentation fault due to vk_error()
vk_error() is a macro that calls __vk_errorf() with instance == NULL.

Then, __vk_errorf() passes a pointer to instance->debug_report_callbacks
to vk_debug_error(), which segfaults as this pointer is invalid but not
NULL.

Fixes: e5b1bd6ab8 "vulkan: move anv VK_EXT_debug_report implementation to common code."

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-01-19 09:39:05 +01:00
Chris Wilson 34499e8ddc intel: Future-proof ring names for aubinator_error_decode
The kernel is moving to a $class$instance naming scheme in preparation
for accommodating more rings in the future in a consistent manner. It is
already using the naming scheme internally, and now we are looking at
updating some soft-ABI such as the error state to use the new naming
scheme. This of course means we need to teach aubinator_error_decode how
to map both sets of ring names onto its register maps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
2018-01-18 17:35:21 +00:00
Iago Toral Quiroga 7ec6e4e689 anv/query: implement multiview interactions
From the Vulkan spec with KHX extensions:

  "If queries are used while executing a render pass instance that has
   multiview enabled, the query uses N consecutive query indices
   in the query pool (starting at query) where N is the number of bits
   set in the view mask in the subpass the query is used in.

   How the numerical results of the query are distributed among the
   queries is implementation-dependent. For example, some implementations
   may write each view's results to a distinct query, while other
   implementations may write the total result to the first query and write
   zero to the other queries. However, the sum of the results in all the
   queries must accurately reflect the total result of the query summed
   over all views. Applications can sum the results from all the queries to
   compute the total result."

In our case we only really emit a single query (in the first query index)
that stores the aggregated result for all views, but we still need to manage
availability for all the other query indices involved, even if we don't
actually use them.

This is relevant when clients call vkGetQueryPoolResults and pass all N
queries to retrieve the results. In that scenario, without this patch,
we will never see queries other than the first being available since we
never emit them.

v2: we need the same treatment for timestamp queries.

v3 (Jason):
 - Better an if instead of an early return.
 - We can't write to this memory in the CPU, we should use
   MI_STORE_DATA_IMM and emit_query_availability (Jason).

v4 (Jason):
 - No need to take the value to write as parameter, just hard code it to 0.

Fixes test failures in some work-in-progress CTS multiview+query tests.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-18 16:37:06 +01:00
Samuel Iglesias Gonsálvez eac629deb6 anv: return VK_ERROR_OUT_OF_DEVICE_MEMORY when surface size is out of HW limits
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-18 06:48:47 +01:00
Francisco Jerez 11674dad8a intel/fs: Optimize and simplify the copy propagation dataflow logic.
Previously the dataflow propagation algorithm would calculate the ACP
live-in and -out sets in a two-pass fixed-point algorithm.  The first
pass would update the live-out sets of all basic blocks of the program
based on their live-in sets, while the second pass would update the
live-in sets based on the live-out sets.  This is incredibly
inefficient in the typical case where the CFG of the program is
approximately acyclic, because it can take up to 2*n passes for an ACP
entry introduced at the top of the program to reach the bottom (where
n is the number of basic blocks in the program), until which point the
algorithm won't be able to reach a fixed point.

The same effect can be achieved in a single pass by computing the
live-in and -out sets in lock-step, because that makes sure that
processing of any basic block will pick up the updated live-out sets
of the lexically preceding blocks.  This gives the dataflow
propagation algorithm effectively O(n) run-time instead of O(n^2) in
the acyclic case.

The time spent in dataflow propagation is reduced by 30x in the
GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP
test-case on my CHV system (the improvement is likely to be of the
same order of magnitude on other platforms).  This more than reverses
an apparent run-time regression in this test-case from my previous
copy-propagation undefined-value handling patch, which was ultimately
caused by the additional work introduced in that commit to account for
undefined values being multiplied by a huge quadratic factor.

According to Chad this test was failing on CHV due to a 30s time-out
imposed by the Android CTS (this was the case regardless of my
undefined-value handling patch, even though my patch substantially
exacerbated the issue).  On my CHV system this patch reduces the
overall run-time of the test by approximately 12x, getting us to
around 13s, well below the time-out.

v2: Initialize live-out set to the universal set to avoid rather
    pessimistic dataflow estimation in shaders with cycles (Addresses
    performance regression reported by Eero in GpuTest Piano).
    Performance numbers given above still apply.  No shader-db changes
    with respect to master.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271
Reported-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-01-17 11:56:08 -08:00
Bas Nieuwenhuizen e5b1bd6ab8 vulkan: move anv VK_EXT_debug_report implementation to common code.
For also using it in radv. I moved the remaining stubs back to
anv_device.c as they were just trivial.

This does not move the vk_errorf/anv_perf_warn or the object
type macros, as those depend on anv types and logging.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-01-17 11:27:52 +01:00
Samuel Iglesias Gonsálvez e63adf8b1e anv: VkDescriptorSetLayoutBinding can have descriptorCount == 0
From Vulkan spec:

"descriptorCount is the number of descriptors contained in the binding,
accessed in a shader as an array. If descriptorCount is zero this
binding entry is reserved and the resource must not be accessed from
any stage via this binding within any pipeline using the set layout."

Fixes:

dEQP-VK.binding_model.descriptor_update.empty_descriptor.uniform_buffer

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2018-01-12 07:08:51 +01:00
Dylan Baker 2083a14179 meson: Use dependencies for nir
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.

This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Dylan Baker 60856a7b49 meson: don't use intermediate variables that are immediately discarded
For things like:
loop
    x = func()
    list += x
end

just do:
loop
    list += func()
end

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Dylan Baker 4ccb981673 meson: Use consistent style for tests
Don't use intermediate variables, use consistent whitespace.

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Dylan Baker fbf192a67e meson: Use consistent style
Currently the meosn build has a mix of two styles:
arg : [foo, ...
       bar],

and
arg : [
  foo, ...,
  bar,
]

For consistency let's pick one. I've picked the later style, which I
think is more readable, and is more common in the mesa code base.

v2: - fix commit message

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Jason Ekstrand c3d802d68e i965: Use UD types for gl_SampleID setup
We already had to switch all of the W types to UW to prevent issues
with vector immediates on gen10.  We may as well use unsigned types
everywhere.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-01-11 14:31:47 -08:00
Jason Ekstrand 3d2b157e23 i965/fs: Use UW types when using V immediates
Gen 10 has a strange hardware bug involving V immediates with W types.
It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2
getting the value {3, 2, 1, 0, 3, 2, 1, 0}.  In particular, the bottom
four nibbles are repeated instead of the top four being taken.  (A mov
of 0x00003210V yields the same result.)  This bug does not appear in any
hardware documentation as far as we can tell and the simulator does not
implement the bug either.

Commit 6132992cdb was mostly a no-op
except that it changed the type of the subgroup invocation from UW to W
and caused us to tickle this bug with basically every compute shader
that uses any sort of invocation ID (which is most of them).  This is
also potentially an issue for geometry shader input pulls and SampleID
setup.  The easy solution is just to change the few places where we use
a vector integer immediate with a W type to use a UW type.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: 6132992cdb
2018-01-11 14:31:38 -08:00
Matt Turner c0ef14f5b1 Revert "Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+""
This reverts commit 2d04572038.

Acked-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-01-11 10:11:59 -08:00
Matt Turner 01ebfbb67a i965/fs: Add/use functions to convert to 3src_align1 vstride/hstride
Some cases weren't handled, such as stride 4 which is needed for 64-bit
operations. Presumably fixes the assertion failure mentioned in commit
2d04572038 (Revert "i965/fs: Use align1 mode on ternary instructions
on Gen10+") but who can really say since the commit neglected to list
any of them!

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-01-11 10:11:59 -08:00
Alex Smith 4fd85617c1 anv: Make sure state on primary is correct after CmdExecuteCommands
After executing a secondary command buffer, we need to update certain
state on the primary command buffer to reflect changes by the secondary.
Otherwise subsequent commands may not have the correct state set.

This fixes various issues (rendering errors, GPU hangs) seen after
executing secondary command buffers in some cases.

v2 (Jason Ekstrand):
 - Reset to invalid values instead of pulling from the secondary
 - Change the comment to be more descriptive

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
2018-01-11 18:11:08 +00:00
Andres Gomez a1901d092c anv: Import mako templates only during execution of anv_extensions
anv_extensions usage from anv_icd was bringing the unwanted dependency
of mako templates for the latter. We don't want that since it will
force the dependency even for distributable tarballs which was not
needed until now.

Jason suggested this approach.

v2: Patch simplification (Jason).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551
Fixes: 0ab04ba979 ("anv: Use python to generate ICD json files")
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-11 14:44:03 +02:00
Samuel Iglesias Gonsálvez c0816389c2 anv: fix maxDescriptorSet* limits
"The maxDescriptorSet* limit is n times the corresponding
maxPerStageDescriptor* limit, where n is the number of shader stages
supported by the VkPhysicalDevice. If all shader stages are supported,
n = 6 (vertex, tessellation control, tessellation evaluation,
geometry, fragment, compute)."

Fixes:

dEQP-VK.api.info.device.properties

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-11 07:00:42 +01:00
Iago Toral Quiroga 4317c848b9 i965/nir: add a helper to lower gl_PatchVerticesIn to a uniform
v2: do not try to handle it as a system value directly for the SPIR-V
    path. In GL we rather handle it as a uniform like we do for the
    GLSL path (Jason).

v3:
  - Remove the uniform variable, it is alwats -1 now (Jason)
  - Also do the lowering for the TessEval stage (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-10 08:21:02 +01:00
Kenneth Graunke 28c2d0d80b genxml: Add missing INSTDONE_1 bits on Gen7.5+.
This will make aubinator_error_decode decode them properly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-01-09 10:13:53 -08:00
Kenneth Graunke 8eadc2fb8f intel: Apply Geminilake "Barrier Mode" workaround.
Apparently, Geminilake requires you to whack a chicken bit to select
either compute or tessellation mode for barriers.  The recommendation
is to switch between them at PIPELINE_SELECT time.

We may not need to do this all the time, but I don't know that it hurts
either.  PIPELINE_SELECT is already a pretty giant stall.

This appears to fix hangs in tessellation control shaders with barriers
on Geminilake.  Note that this requires a corresponding kernel change,

    drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.

in order for the register write to actually happen.  Without an updated
kernel, this register write will be noop'd and the fix will not work.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-01-09 10:13:33 -08:00
Scott D Phillips 42f421cbbf aubinator: add support for aubinating memtrace aubs
Memtrace aubs are similar to classic aubs, with the major
difference being how command submission is serialized (as register
writes instead of a high-level submit message). Some internal
tools generate or consume only memtrace aubs.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-01-08 21:11:11 -08:00
Scott D Phillips 8cdf5bd292 aubinator: extract aubinator_init() out of the header handler function
A later patch will use the aubinator_init() function from the
memtrace aub header handler.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-01-08 21:11:11 -08:00
Scott D Phillips 4f0a2ff4c1 aubinator: honor --color option when printing the header
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-01-08 21:11:11 -08:00
Alex Smith 0d8b9c529c anv: Allow PMA optimization to be enabled in secondary command buffers
This was never enabled in secondary buffers because hiz_enabled was
never set to true for those.

If the app provides a framebuffer in the inheritance info when beginning
a secondary buffer, we can determine if HiZ is enabled and therefore
allow the PMA optimization to be enabled within the command buffer.

This improves performance by ~13% on an internal benchmark on Skylake.

v2: Use anv_cmd_buffer_get_depth_stencil_view().

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-08 09:31:17 +00:00
Alex Smith 12f4e00b69 anv: Take write mask into account in has_color_buffer_write_enabled
If we have a color attachment, but its writes are masked, this would
have still returned true. This is inconsistent with how HasWriteableRT
in 3DSTATE_PS_BLEND is set, which does take the mask into account.

This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA
if the fragment shader does use UAVs, meaning the fragment shader may
not be invoked because HasWriteableRT is false. Specifically, this was
seen to occur when the shader also enables early fragment tests: the
fragment shader was not invoked despite passing depth/stencil.

Fix by taking the color write mask into account in this function. This
is consistent with how things are done on i965.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-01-05 15:36:22 +00:00
Alex Smith 00a81e9909 anv: Add missing unlock in anv_scratch_pool_alloc
Fixes hangs seen due to the lock not being released here.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-01-04 14:54:02 +00:00
Kenneth Graunke 74e1d6e20c i965: Drop support for the legacy SNORM -> Float equation.
Older OpenGL defines two equations for converting from signed-normalized
to floating point data.  These are:

    f = (2c + 1)/(2^b - 1)                (equation 2.2)
    f = max{c/2^(b-1) - 1), -1.0}         (equation 2.3)

Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2.  DirectX uses equation
2.3 as well.  Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.

This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.

Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile.  That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.

With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules.  Zero cases
have been reported of this being a problem.  However, we've received a
report that following the old rules breaks expectations.  SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:

https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405

So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.

If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too.  We'd
likely restore this support as a driconf option.  Until then, drop it.

This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):

    draw-vertices-2101010: Accept either SNORM conversion formula.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2018-01-02 16:51:42 -08:00
Kenneth Graunke a1afef8de0 i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.
These are the same, we don't need a separate opcode enum per backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-30 20:30:34 -08:00
Jason Ekstrand 967d238c69 anv/device: Mark all state buffers as needing capture
Previously, we were flagging the instruction state buffer for capture
but not surface state or dynamic state.  We want those captured too.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-28 10:39:04 -08:00
Jason Ekstrand 69fa3fb77f intel/aubinator: Gracefully handle dynamic state not being available
Some older versions of the Vulkan driver didn't properly tag dynamic
state as needing to be captured.  Also, this prevents crashes when
looking at dumps on older kernels.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-28 10:39:04 -08:00
Jason Ekstrand a92d52c3c1 intel/aubinator: Free section data last
We were walking the sections, printing the batches, and then freeing
them in one pass.  If the batch happens to reference any earlier
sections (which it almost certainly will since it's at the end), we will
access freed memory.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-28 10:39:04 -08:00
Anuj Phogat 2d04572038 Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+"
This reverts commit 9cd60fce9c.
Above commit caused 2000+ piglit tests to assert fail. Disabling
the align1 mode on gen10 for now to avoid failures.

Cc: Matt Turner <mattst88@gmail.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-12-22 16:40:40 -08:00
Francisco Jerez b3e3cb9901 intel/fs: Initialize fs_visitor::grf_used on construction.
This should shut up some Valgrind errors during pre-regalloc
scheduling.  The errors were harmless since they could only have led
to the estimation of the bank conflict penalty of an instruction
pre-regalloc, which is inaccurate at that point of the program
compilation, but no less accurate than the intended "return 0"
fall-back path.  The scheduling pass is normally re-run after regalloc
with a well-defined grf_used value and accurate bank conflict
information.

Fixes: acf98ff933 "intel/fs: Teach instruction scheduler about GRF bank conflict cycles."
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-21 15:20:17 -08:00
Francisco Jerez 1aa79d5ed5 intel/fs/bank_conflicts: Use posix_memalign() instead of overaligned new to obtain vector storage.
The weight_vector_type constructor was inadvertently assuming C++17
semantics of the new operator applied on a type with alignment
requirement greater than the largest fundamental alignment.
Unfortunately on earlier C++ dialects the implementation was allowed
to raise an allocation failure when the alignment requirement of the
allocated type was unsupported, in an implementation-defined fashion.
It's expected that a C++ implementation recent enough to implement
P0035R4 would have honored allocation requests for such over-aligned
types even if the C++17 dialect wasn't active, which is likely the
reason why this problem wasn't caught by our CI system.

A more elegant fix would involve wrapping the __SSE2__ block in a
'__cpp_aligned_new >= 201606' preprocessor conditional and continue
taking advantage of the language feature, but that would yield lower
compile-time performance on old compilers not implementing it
(e.g. GCC versions older than 7.0).

Fixes: af2c320190 "intel/fs: Implement GRF bank conflict mitigation pass."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226
Reported-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-21 15:19:59 -08:00
Samuel Iglesias Gonsálvez a31f0c4a36 anv: disallow VK_REMAINING_ARRAY_LAYERS in vkCmdClearAttachments()
Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed
in the passed VkClearRect struct.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-20 06:55:41 +01:00
Rafael Antognolli 85789831b4 intel/compiler/gen10: Disable push constants.
We still have gpu hangs on Cannonlake when using push constants, so
disable them for now until we have a proper fix for these hangs.

v2: Add warning message when creating context too.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2017-12-19 12:32:24 -08:00
Bas Nieuwenhuizen 6d9849d63e anv: Remove unused variable.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-17 14:53:46 +01:00
Kenneth Graunke 02720f8d24 isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell.
According to the RENDER_SURFACE_STATE internal documentation, the
R32G32B32_FLOAT restriction is marked "IVB" only.  We choose to apply
it to Ivybridge and Baytrail, but not Haswell.

Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell.

Changes these tests from crashing to skipping on Haswell:
- KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f
- KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-15 14:00:09 -08:00
Jason Ekstrand 4b8c9ea46b intel/tools: Convert aubinator over to the common framework
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:24 -08:00
Jason Ekstrand 35f9c27be3 intel/batch-decoder: Decode registers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:22 -08:00
Jason Ekstrand 81e4ecbc19 intel/batch-decoder: Decode dynamic state
Unfortunately, in aubinator and aubinator_error_decode we don't always
know how many of a given state we have, so we must guess.  One day,
we'll come up with a way to annotate the batch to solve this problem.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:20 -08:00
Jason Ekstrand 4ac2ee9001 intel/batch-decoder: Decode constants, binding tables, and samplers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:18 -08:00
Jason Ekstrand d374423eab intel/tools: Switch aubinator_error_decode over to the gen_print_batch
The shared framework can now do everything that aubinator_error_decode
ever did and more.  It's time to make the switch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:16 -08:00
Jason Ekstrand c86671c438 intel/batch-decoder: Decode graphics shaders
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:15 -08:00
Jason Ekstrand d4081fb778 intel/batch-decoder: Decode vertex and index buffers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:13 -08:00
Jason Ekstrand e27ec208ed intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:12 -08:00
Jason Ekstrand be20043d00 intel/tools: Add the start of a generic batch decoder
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:10 -08:00
Jason Ekstrand 4cb96fbd91 intel/decoder: Expose the raw field value in the iterator
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:09 -08:00
Jason Ekstrand 79269e8f4b intel/disasm: Take a devinfo in gen_disasm_create
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:06 -08:00
Jason Ekstrand a7ae72032f intel/decoder: Take a bit offset in gen_print_group
Previously, if a group was nested in another group such that it didn't
start on a dword boundary, we would decode it as if it started at the
start of its first dword.  This changes things to work even more in
terms of bits so that we can properly decode these structs.  This
affects MOCS, attribute swizzles, and several other things.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:04 -08:00
Jason Ekstrand dca8f466ee intel/decoder: Stop rounding down to the nearest dword
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:03 -08:00
Jason Ekstrand f264640693 intel/decoder: Convert the iterator to work entirely in bits
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:01 -08:00
Jason Ekstrand ada705b671 intel/decoder: Drop gen_field_decode helper
It's unused

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:26:44 -08:00
Francisco Jerez acab52f520 intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers.
Fixes: af2c320190 "intel/fs: Implement GRF bank conflict mitigation pass."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199
Reported-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-12-12 12:05:45 -08:00
Samuel Iglesias Gonsálvez ba4bb0838b anv: fix bug when using component qualifier in FS outputs
We can write to the same output but in different components, like
in this example:

layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;

Therefore, they are not two different outputs but only one.

Fixes:

dEQP-VK.glsl.440.linkage.varying.component.frag_out.*

v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
  outputs when having arrays (Jason).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-12 07:24:55 +01:00
Jason Ekstrand 4c7af87fb9 anv: Enable UBO pushing
Push constants on Intel hardware are significantly more performant than
pull constants.  Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be.  By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.

On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:26 -08:00
Jason Ekstrand f1ce0b905a i965/fs: Handle !supports_pull_constants and push UBOs properly
In Vulkan, we don't support classic pull constants and everything the
client asks us to push, we push.  However, for pushed UBOs, we still
want to fall back to conventional pulls if we run out of space.
2017-12-08 15:43:25 -08:00
Jason Ekstrand 8d34077182 anv/device: Increase the UBO alignment requirement to 32
Push constants work in terms of 32-byte chunks so if we want to be able
to push UBOs, every thing needs to be 32-byte aligned.  Currently, we
only require 16-byte which is too small.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 2f9eb045f3 anv/cmd_buffer: Add support for pushing UBO ranges
In order to do this we have to modify push constant set up to handle
ranges.  We also have to tweak the way we handle dirty bits a bit so
that we re-push whenever a descriptor set changes.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 0c879b62b0 anv/cmd_buffer: Add some stage asserts
There are several places where we look up opcodes in an array of stages.
Assert that the we don't end up going out-of-bounds.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 1968cd07a2 anv/cmd_buffer: Add some helpers for working with descriptor sets
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 1bce04deb8 anv/pipeline: Translate vulkan_resource_index to a constant when possible
We want to call brw_nir_analyze_ubo_ranges immedately after
anv_nir_apply_pipeline_layout and it badly wants constants.  We could
run an optimization step and let constant folding do it but that's way
more expensive than needed.  It's really easy to just handle constants
in apply_pipeline_layout.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 3b34ed79f1 i965/fs: Rewrite assign_constant_locations
This rewires the logic for assigning uniform locations to work in terms
of "complex alignments".  The basic idea is that, as we walk the list of
instructions, we keep track of the alignment and continuity requirements
of each slot and assert that the alignments all match up.  We then use
those alignments in the compaction stage to ensure that everything gets
placed at a properly aligned register.  The old mechanism handled
alignments by special-casing each of the bit sizes and placing 64-bit
values first followed by 32-bit values.

The old scheme had the advantage of never leaving a hole since all the
64-bit values could be tightly packed and so could the 32-bit values.
However, the new scheme has no type size special cases so it handles not
only 32 and 64-bit types but should gracefully extend to 16 and 8-bit
types as the need arises.

Tested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-12-08 15:43:25 -08:00
Jason Ekstrand 597c194487 anv: Disable VK_KHR_16bit_storage
The testing for this extension is currently very poor.  The CTS tests
only test accessing UBOs and SSBOs at dynamic offsets so none of our
constant-offset paths get triggered at all.  Also, there's an assertion
in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which
is never triggered indicating that nothing every gets loaded from an
offset which is not a dword.  Both push constants and the constant
offset pull paths are complex enough, we really don't want to ship
without tests.  We'll turn the extension back on once we have decent
tests.
2017-12-08 15:42:55 -08:00
Francisco Jerez 4d1959e693 intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.
This addresses a long-standing back-end compiler bug that could lead
to cross-channel data corruption in loops executed non-uniformly.  In
some cases live variables extending through a loop divergence point
(e.g. a non-uniform break) into a convergence point (e.g. the end of
the loop) wouldn't be considered live along all physical control flow
paths the SIMD thread could possibly have taken in between due to some
channels remaining in the loop for additional iterations.

This patch fixes the problem by extending the CFG with physical edges
that don't exist in the idealized non-vectorized program, but
represent valid control flow paths the SIMD EU may take due to the
divergence of logical threads.  This makes sense because the i965 IR
is explicitly SIMD, and it's not uncommon for instructions to have an
influence on neighboring channels (e.g. a force_writemask_all header
setup), so the behavior of the SIMD thread as a whole needs to be
considered.

No changes in shader-db.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-12-07 18:27:05 -08:00
Francisco Jerez 9355116bda intel/fs: Don't let undefined values prevent copy propagation.
This makes the dataflow propagation logic of the copy propagation pass
more intelligent in cases where the destination of a copy is known to
be undefined for some incoming CFG edges, building upon the
definedness information provided by the last patch.  Helps a few
programs, and avoids a handful shader-db regressions from the next
patch.

shader-db results on ILK:

  total instructions in shared programs: 6541547 -> 6541523 (-0.00%)
  instructions in affected programs: 360 -> 336 (-6.67%)
  helped: 8
  HURT: 0

  LOST:   0
  GAINED: 10

shader-db results on BDW:

  total instructions in shared programs: 8174323 -> 8173882 (-0.01%)
  instructions in affected programs: 7730 -> 7289 (-5.71%)
  helped: 5
  HURT: 2

  LOST:   0
  GAINED: 4

shader-db results on SKL:

  total instructions in shared programs: 8185669 -> 8184598 (-0.01%)
  instructions in affected programs: 10364 -> 9293 (-10.33%)
  helped: 5
  HURT: 2

  LOST:   0
  GAINED: 2

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-07 18:27:04 -08:00
Francisco Jerez c3c1aa5aeb intel/fs: Restrict live intervals to the subset possibly reachable from any definition.
Currently the liveness analysis pass would extend a live interval up
to the top of the program when no unconditional and complete
definition of the variable is found that dominates all of its uses.

This can lead to a serious performance problem in shaders containing
many partial writes, like scalar arithmetic, FP64 and soon FP16
operations.  The number of oversize live intervals in such workloads
can cause the compilation time of the shader to explode because of the
worse than quadratic behavior of the register allocator and scheduler
when running out of registers, and it can also cause the running time
of the shader to explode due to the amount of spilling it leads to,
which is orders of magnitude slower than GRF memory.

This patch fixes it by computing the intersection of our current live
intervals with the subset of the program that can possibly be reached
from any definition of the variable.  Extending the storage allocation
of the variable beyond that is pretty useless because its value is
guaranteed to be undefined at a point that cannot be reached from any
definition.

According to Jason, this improves performance of the subgroup Vulkan
CTS tests significantly (e.g. the runtime of the dvec4 broadcast test
improves by nearly 50x).

No significant change in the running time of shader-db (with 5%
statistical significance).

shader-db results on IVB:

  total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
  cycles in affected programs: 16335482 -> 16159558 (-1.08%)
  helped: 5121
  HURT: 4347

  total spills in shared programs: 1309 -> 1288 (-1.60%)
  spills in affected programs: 249 -> 228 (-8.43%)
  helped: 3
  HURT: 0

  total fills in shared programs: 1652 -> 1597 (-3.33%)
  fills in affected programs: 262 -> 207 (-20.99%)
  helped: 4
  HURT: 0

  LOST:   2
  GAINED: 209

shader-db results on BDW:

  total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
  cycles in affected programs: 23397142 -> 23141100 (-1.09%)
  helped: 8045
  HURT: 6488

  total spills in shared programs: 1456 -> 1252 (-14.01%)
  spills in affected programs: 465 -> 261 (-43.87%)
  helped: 3
  HURT: 0

  total fills in shared programs: 1720 -> 1465 (-14.83%)
  fills in affected programs: 471 -> 216 (-54.14%)
  helped: 4
  HURT: 0

  LOST:   2
  GAINED: 162

shader-db results on SKL:

  total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
  cycles in affected programs: 22560936 -> 22369874 (-0.85%)
  helped: 8457
  HURT: 6247

  total spills in shared programs: 437 -> 437 (0.00%)
  spills in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total fills in shared programs: 870 -> 854 (-1.84%)
  fills in affected programs: 16 -> 0
  helped: 1
  HURT: 0

  LOST:   0
  GAINED: 107

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-07 18:27:04 -08:00
Francisco Jerez acf98ff933 intel/fs: Teach instruction scheduler about GRF bank conflict cycles.
This should allow the post-RA scheduler to do a slightly better job at
hiding latency in presence of instructions incurring bank conflicts.
The main purpuse of this patch is not to improve performance though,
but to get conflict cycles to show up in shader-db statistics in order
to make sure that regressions in the bank conflict mitigation pass
don't go unnoticed.

Acked-by: Matt Turner <mattst88@gmail.com>
2017-12-07 15:56:49 -08:00
Francisco Jerez af2c320190 intel/fs: Implement GRF bank conflict mitigation pass.
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput.  This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation.  It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.

In a shader-db run on SKL this helps roughly 46k shaders:

   total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
   conflicts in affected programs: 816222 -> 407702 (-50.05%)
   helped: 46234
   HURT: 72

The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.

On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies.  E.g. for a shader-db run on SNB:

   total conflicts in shared programs: 944636 -> 623185 (-34.03%)
   conflicts in affected programs: 853258 -> 531807 (-37.67%)
   helped: 31052
   HURT: 19

And on BDW:

   total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
   conflicts in affected programs: 1179787 -> 748933 (-36.52%)
   helped: 47592
   HURT: 70

On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.

NOTE: This patch intentionally disregards some i965 coding conventions
      for the sake of reviewability.  This is addressed by the next
      squash patch which introduces an amount of (for the most part
      boring) boilerplate that might distract reviewers from the
      non-trivial algorithmic details of the pass.

The following patch is squashed in:

SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.

Acked-by: Matt Turner <mattst88@gmail.com>
2017-12-07 15:56:06 -08:00
Eric Engestrom 4cba39331d meson: add dep_thread to every lib that includes threads.h
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-12-07 17:29:42 +00:00
Fredrik Höglund 5e1cb16768 anv: fix a case statement in GetMemoryFdPropertiesKHR
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.

Fixes: ab18e8e59b ("anv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 20:04:39 +01:00
Jose Maria Casanova Crespo a1e257a5bf i965/fs: Use untyped_surface_read for 16-bit load_ssbo
SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.

But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using only
one message. Once each pair is read it is unshuffled to return the proper
16-bit components. vec3 case is assimilated to vec4 but the 4th component
is ignored.

16-bit scalars are read using one byte_scattered_read message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
    Rework optimization using unshuffle 16 reads (Chema Casanova)
v3: Use W and D types insead of HF and F in shuffle to avoid rounding
    erros (Jason Ekstrand)
    Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
v4: Use subscript insead of chaging type and stride  (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo ce2e572c4c i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg
Currently, we use byte-scattered write messages for storing 16-bit
into an SSBO. This is because untyped surface messages have a fixed
32-bit size.

This patch optimizes these 16-bit writes by combining 2 values (e.g,
two consecutive components aligned with 32-bits) into a 32-bit register,
packing the two 16-bit words.

16-bit single component values will continue to use byte-scattered
write messages. The same will happens when the first consecutive
component is not aligned 32-bits.

This optimization reduces the number of SEND messages used for storing
16-bit values potentially by 2 or 4, which cuts down execution time
significantly because byte-scattered writes are an expensive
operation as they only write a component for message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
    Rework optimization using shuffle 16 write and enable writes
    of 16bit vec4 with only one message of 32-bits. (Chema Casanova)
v3: - Fix coding style (Eduardo Lima)
    - Reorganize code to avoid duplication. (Jason Ekstrand)
    - Include new comments to explain the length calculations to
      fix alignment issues of components. (Jason Ekstrand)
    - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand)
v4: (Jason Ekstrand)
    - Reorganize 64-bit ssbo-writes to avoid using slots_per_component.
    - Comment about why suffle is needed when using byte_scattered_write.

Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro 66ce6ce78f anv: Enable SPV_KHR_16bit_storage and VK_KHR_16bit_storage for SSBO/UBO
Enables SPV_KHR_16bit_storage on gen 8+.

VK_KHR_16bit_storage is enabled for SSBO/UBO using the
VK_KHR_get_physical_device_properties2 functionality to expose
if the extension is supported or not.

v2: update due rebase against master (Alejandro)
v3: (Jason Ekstrand)
    - Move this patch up in VK_KHR_16bit_storage series enabling only
      storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess.
    - Only expose VK_KHR_16bit_storage on Gen8+
v4: (Jason Ekstrand)
    - Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage
      enablement for SSBO/UBO.

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jason Ekstrand 3282309f74 i965/fs: Enables 16-bit load_ubo with sampler
load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
surface format defined. So when reading 16-bit components with the
sampler we need to unshuffle two 16-bit components from each 32-bit
component.

Using the sampler avoids the use of the byte_scattered_read message
that needs one message for each component and is supposed to be
slower.

v2: (Jason Ekstrand)
    - Simplify component selection and unshuffling for different bitsizes
    - Remove SKL optimization of reading only two 32-bit components when
      reading 16-bits types.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo 3db31c0b06 i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components
This helpers are used to load/store 16-bit types from/to 32-bit
components.

The functions shuffle_32bit_load_result_to_16bit_data and
shuffle_16bit_data_for_32bit_write are implemented in a similar
way than the analogous functions for handling 64-bit types.

v1: Explain need of temporary in shuffle operations. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo fa4a9d63bb i965/fs: Use byte scattered read for 16-bit load_ssbo
Used to enable 16-bit reads at do_untyped_vector_read, that is used on
the following intrinsics:

   * nir_intrinsic_load_shared
   * nir_intrinsic_load_ssbo

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)

v3: - Add bitsize to scattered read operation (Jason Ekstrand)
    - Remove implementation of 16-bit UBO read from this patch.
    - Avoid assertion at opt_algebraic caused by ADD of two IMM with
      offset with BRW_REGISTER_TYPE_UD type found on matrix tests.
      (Jose Maria Casanova)
v4: (Jason Ekstrand)
    - Put if case for 16-bits at the beginning of the if ladder.
    - Use type_sz(dest.type) * 8 as bit_size parameter for scattered read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo c57a3f200d i965/fs: Add byte scattered read message and fs support
v2: Fix alignment style (Topi Pohjolainen)
    (Jason Ekstrand)
    - Enable bit_size parameter to scattered messages to enable different
      bitsizes byte/word/dword.
    - Remove use of brw_send_indirect_scattered_message in favor of
      brw_send_indirect_surface_message.
    - Move scattered messages to surface messages namespace.
    - Assert align1 for scattered messages and assume Gen8+.
    - Inline brw_set_dp_byte_scattered_read.

v3: (Jason Ekstrand)
    - Use renamed brw_byte_scattered_data_element_from_bit_size method
    - Assert scattered read for Gen8+ and Haswell.
    - Use conditional expresion at components_read.
    - Include comment about params for scattered opcodes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro a4031bdfa9 i965/fs: Predicate byte scattered writes if needed
While on Untyped Surface messages the bits of the execution mask are
ANDed with the corresponding bits of the Pixel/Sample Mask, that is
not the case for byte scattered writes. That is needed to avoid ssbo
stores writing on helper invocations. So when that can affect, we load
the sample mask, and predicate the send message.

Note: the need for this patch was tested with a custom test. Right now
the 16 bit storage CTS tests doesnt need this path in order to get a
full pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro 96f1926aab i965/fs: Use byte_scattered_write on 16-bit store_ssbo
We need to rely on byte scattered writes as untyped writes are 32-bit
size. We could try to keep using 32-bit messages when we have two or
four 16-bit elements, but for simplicity sake, we use the same message
for any component number. We revisit this aproach in the follwing
patches.

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)

v3: (Jason Ekstrand)
    - Include bit_size to scattered write message and remove namespace
    - specific for scattered messages.
    - Move comment to proper place.
    - Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo.
    (Jose Maria Casanova)
    - Take into account that get_nir_src returns now WORD types for
      16-bit sources instead of DWORD.
v4: (Jason Ekstrand)
    - Rename lenght variable to num_components.
    - Include assertions before emit_untyped_write.
    - Remove type_slot in favor of num_slot and first_slot.

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo f1a9936ee1 i965/fs: Add byte scattered write message and fs support
v2: (Jason Ekstrand)
    - Enable bit_size parameter to scattered messages to enable different
      bitsizes byte/word/dword.
    - Remove use of brw_send_indirect_scattered_message in favor of
      brw_send_indirect_surface_message.
    - Move scattered messages to surface messages namespace.
    - Assert align1 for scattered messages and assume Gen8+.
    - Inline brw_set_dp_byte_scattered_write.
v3: - Remove leftover newline (Topi Pohjolainen)
    - Rename brw_data_size to brw_scattered_data_element and use
      defines instead of an enum (Jason Ekstrand)
    - Assert scattered write for Gen8+ and Haswell (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro d038deaa40 i965/fs: Add remove_extra_rounding_modes optimization
Although from SPIR-V point of view, rounding modes are attached to the
operation/destination, on i965 it is a status, so we don't need to
explicitly set the rounding mode if the one we want is already set.

Taking into account that the default mode is RTE, one possible
optimization would be optimize out the first RTE set for each
block. For in order to work, we would need to take into account block
interrelationships. At this point, it is not worth to complicate the
optimization for such small gain.

v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
    with the rounding mode (Curro)
v3: Reset optimization for every block. (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro 82fa4d45e7 i965/fs: Enable rounding mode on f2f16 ops
By default we don't set the rounding mode. We only set
round-to-near-even or round-to-zero mode if explicitly set from nir.

v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
    with the rounding mode (Curro)

v3: Use new helper brw_rnd_mode_from_nir_op  (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro d6cd14f213 i965/fs: Define new shader opcode to set rounding modes
Although it is possible to emit them directly as AND/OR on brw_fs_nir,
having a specific opcode makes it easier to remove duplicate settings
later.

v2: (Curro)
  - Set thread control to 'switch' when using the control register
  - Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
    with the rounding mode.
  - Avoid magic numbers setting rounding mode field at control register.
v3: (Curro)
  - Remove redundant and add missing whitespace lines.
  - Match printing instruction to IR opcode "rnd_mode"

v4: (Topi Pohjolainen)
  - Fix code style.

Signed-off-by:  Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by:  Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo ac8d4734f6 i965: Add support for control register
Control register cr0 in i965 can be used to change the rounding modes
in 32-bit to 16-bit floating-point conversions.

From intel Skylake PRM, vol 07, section "Register and Tegister Regions",
 subsection "Control Register" (page 754):

"Subregister cr0.0:ud contains normal operation control fields such as the
 floating-point mode ... "

Floating-point Rounding mode is changed at bits 5:4 of cr0.0:

"Rounding Mode. This field specifies the FPU rounding mode. It is
initialized by Thread Dispatch."
  00b = Round to Nearest or Even (RTNE)
  01b = Round Up, toward +inf (RU)
  10b = Round Down, toward -inf (RD)
  11b = Round Toward Zero (RTZ)"

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro 5d5ee507fb i965/fs: Handle 32-bit to 16-bit conversions
Conversions to 16-bit need having aligment between the 16-bit
and 32-bit types. So the conversion operations unpack 16-bit types
to with an stride=2 and then applies a MOV with the conversion.

v2 (Jason Ekstrand):
  - Avoid the general use of stride=2 for 16-bit register types.

v3 (Topi Pohjolainen)
  - Code style fix
   (Jason Ekstrand)
  - Now nir_op_f2f16 was renamed to nir_op_f2f16_undef
    because conversion to f16 with undefined rounding is explicit

Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro a05b6f25bf i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type
Note that we don't remove the assert at i965/vec4. At this point half
float support is only for the scalar backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo 75a88d8567 i965: Support for 16-bit base types in helper functions
v2: Fixed calculation of scalar size for 16-bit types. (Jason Ekstrand)
v3: Fix coding style (Topi Pohjolainen)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Eduardo Lima <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Alejandro Piñeiro 2d28ca7000 i965/vec4: Handle 16-bit types at type_size_xvec4
These types have similar vec4 sizes as their 32-bit counterparts.

The vec4 backend doesn't support 16-bit types and probably never will,
but this method is called by the scalar backend at
fs_visitor::nir_setup_outputs(), so we still need to provide valid vec4
sizes for 16-bit types. In the future, something different should be
implemented to avoid this dependency.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jason Ekstrand 8761a04d0d anv: Add support for the variablePointers feature
Not to be confused with variablePointersStorageBuffer which is the
subset of VK_KHR_variable_pointers required to enable the extension.
This means we now have "full" support for variable pointers.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2017-12-05 22:01:54 -08:00
Jason Ekstrand 32c859125b anv: Handle nir_intrinsic_vulkan_resource_reindex
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2017-12-05 22:01:54 -08:00
Chad Versace a932aee7a8 intel/isl: Declare private array as static const
It's array isl_drm.c:modifier_info[] .

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-12-04 10:16:33 -08:00
Lionel Landwerlin 2ead8f1690 anv: query CS timestamp frequency from the kernel
The reference value in gen_device_info isn't going to be acurate on
Gen10+. We should query it from the kernel, which reads a couple of
register to compute the actual value.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-12-04 18:05:20 +00:00
Jason Ekstrand 0a10e3770f vulkan/wsi: Initialize individual WSI interfaces in wsi_device_init
Now that we have anv_device_init/finish functions, there's no reason to
have the individual driver do any more work than that.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand 2e3e55110b vulkan/wsi: Drop some unneeded cruft from the API
This drops the unneeded callbacks struct as well as the queue_get_family
callback we were using before we'd pulled QueuePresent inside.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand c1b1be5196 vulkan/wsi: Add wrappers for all of the surface queries
This lets us move wsi_interface to wsi_common_private.h

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand 82931dc007 vulkan/wsi: Drop the can_handle_different_gpu parameter from get_support
Both anv and radv can handle prime now.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand 516dfb34e1 vulkan/wsi: Add a helper for AcquireNextImage
Unfortunately, due to the fact that AcquireNextImage does not take a
queue, the ANV trick for triggering the fence won't work in general.  We
leave dealing with the fence up to the caller for now.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Dave Airlie 8ff49951c3 vulkan/wsi: move swapchain create/destroy to common code
v2 (Jason Ekstrand):
 - Rebase
 - Alter the names of the helpers to better match the vulkan entrypoints
 - Use the helpers in anv

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand 393aa3f6c9 vulkan/wsi: Move get_images into common code
This moves bits out of all four corners (anv, radv, x11, wayland) and
into the wsi common code.  We also switch to using an outarray to ensure
we get our return code right.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand 1117f843fe anv/wsi: Enable prime support
Now that we're using the same common code as radv, we get prime support
for free.  Just enable it.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand ac95335b61 anv/wsi: Use the common QueuePresent code
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand e12688f365 vulkan/wsi: Do image creation in common code
This uses the mock extension created in a previous commit to tell the
driver that the image it's just been asked to create is, in fact, a
window system image with whatever assumptions that implies.  There was a
lot of redundant code between the two drivers to do basically exactly
the same thing.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand d50937f137 vulkan/wsi: Implement prime in a completely generic way
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand 3dabb4011f anv/image: Implement the wsi "extension"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand a44744e01d anv: Require a dedicated allocation for modified images
This lets us set the BO tiling when we allocate the memory.  This is
required for GL to work properly.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand 7d19e570e1 anv/image: Add a drm_format_mod field
At the moment, this is always initialized to DRM_FORMAT_MOD_INVALID.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand ab18e8e59b anv: Implement VK_EXT_external_memory_dma_buf
This is a modified version of the patch originally sent by Chad Versace.
The primary difference is that this version claims that OPQAUE_FD and
DMA_BUF are compatible handle types.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Jason Ekstrand 764fc1643c vulkan/wsi: Add a wsi_device_init function
This gives the opportunity to collect some function pointers if we'd
like which will be very useful in future.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Daniel Stone c1163f7b1c vulkan/wsi: Add a wsi_image structure
This is used to hold information about the allocated image, rather than
an ever-growing function argument list.

v2 (Jason Ekstrand):
 - Rename wsi_image_base to wsi_image

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-12-04 10:04:19 -08:00
Dave Airlie 2cbeb32555 vulkan/wsi: use function ptr definitions from the spec.
This just seems cleaner, and we may expand this in future.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-04 10:04:19 -08:00
Jason Ekstrand e19c623128 spirv: Convert the supported_extensions struct to spirv_options
This is a bit more general and lets us pass additional options into the
spirv_to_nir pass beyond what capabilities we support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2017-12-02 08:09:11 -08:00
Rafael Antognolli 2919adffe9 intel/compiler: Implement WaClearTDRRegBeforeEOTForNonPS.
The bspec describes:

   "WA: Clear tdr register before send EOT in all non-PS shader kernels

   mov(8) tdr0:ud 0x0:ud {NoMask}"

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-12-01 11:27:27 -08:00
Vadym Shovkoplias cdb3eb7174 intel/blorp: Fix possible NULL pointer dereferencing
Fix incomplete check of input params in blorp_surf_convert_to_uncompressed()
which can lead to NULL pointer dereferencing.

Fixes: 5ae8043fed ("intel/blorp: Add an entrypoint for doing
bit-for-bit copies")
Fixes: f395d0abc8 ("intel/blorp: Internally expose
surf_convert_to_uncompressed")
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-11-30 16:20:05 +02:00
Vinson Lee 8c1e4b1afc anv: Check if memfd_create is already defined.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103909
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-11-30 01:36:46 -08:00
Iago Toral Quiroga 8620f7ebbc i965/vec4: use a temp register to compute offsets for pull loads
64-bit pull loads are implemented by emitting 2 separate
32-bit pull load messages, where the second message loads from
an offset at +16B.

That addition of 16B to the original offset should not alter the
original offset register used as source for the pull load instruction
though, since the compiler might use that same offset register in other
instructions (for example, for other pull loads in the shader code
that take that same offset as reference).

If the pull load is 32-bit then we only need to emit one message and
we don't need to do offset calculations, but in that case the optimizer
should be able to drop the redundant MOV.

Fixes the following test on Haswell:
KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components

Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
2017-11-30 07:57:53 +01:00
Kenneth Graunke 3d68329a65 i965: Move perf_debug and WARN_ONCE back to brw_context.h.
These were moved to src/intel/common/gen_debug.h, but they are not
common code.  They assume that brw_context or gl_context variables
exist, named brw or ctx.  That isn't remotely true outside of i965.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-28 15:23:16 -08:00
Lionel Landwerlin 349712018b i965: add a debug option to disable oa config loading
This provides a good way to verify we haven't broken using the perf
driver on older kernels (which don't have the oa config loading
mechanism).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-28 13:34:04 +00:00
Jason Ekstrand d7c8c7bd9d intel/blorp: Drop blorp_resolve_ccs_attachment
The only reason why we needed that version was because the Vulkan driver
needed to be able to create the surface states so it could handle
indirect clear colors.  Now that blorp handles them natively, there's no
need for the extra entrypoint.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:22:13 -08:00
Jason Ekstrand 5bc2849af9 anv: Let blorp handle indirect clear colors for CCS resolves
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:22:13 -08:00
Jason Ekstrand 34b95f88e6 anv: Move get_fast_clear_state_address into anv_private.h
While we're at it, we break it into two nicely named functions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:22:13 -08:00
Jason Ekstrand 8915621882 intel/blorp: Take a range of layers in blorp_ccs_resolve
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:22:13 -08:00
Jason Ekstrand 67b676f0c5 intel/blorp: Add initial support for indirect clear colors
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-27 16:22:12 -08:00
Jason Ekstrand 86becfd2de intel/blorp: Add fast-clear to the special case in MSAA resolves
This doesn't go all the way of avoiding the txf_ms if it's fast-cleared,
however it does at least make us only do it once.  This should improve
performance of MSAA resolves in the presence of lots of clear color.
Without the patch, enabling fast-clears in the multisampling Sascha demo
drops the framerate by about 10%.  With this patch, enabling fast-clears
increases the demo's framerate by 25%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:22:11 -08:00
Jason Ekstrand dc21c3937c intel/blorp/blit: Rename blorp_nir_txf_ms_mcs
That name is already taken by one of the helpers in blorp_nir_builder.h
and, while we haven't moved the guts of blorp_blit.c there yet, we'd
like to start using some things from that header.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:19:38 -08:00
Iago Toral Quiroga f1873956db i965/vec4: fix splitting of interleaved attributes
When we split an instruction that reads an uniform value
(vstride 0) we need to respect the vstride on the second
half of the instruction (that is, the second half should
read the same region as the first).

We were doing this already, but we didn't account for
stages that have interleaved input attributes which also
have a vstride of 0 and need the same treatment.

Fixes the following on Haswell:
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_structure_locations

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Andres Gomez <agomez@igalia.com>
2017-11-24 09:24:06 +01:00
Eric Engestrom 1d3944aeeb genxml: fix assert guards
This removes a few hundred warnings on debug builds with asserts off.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-23 09:44:16 +00:00
Lionel Landwerlin d4c52c5408 anv: flag batch & instruction BOs for capture
When the kernel support flagging our BO, let's mark batch &
instruction BOs for capture so then can be included in the error
state.

v2: Only add EXEC_CAPTURE if supported (Kristian)

v3: Fix operator precedence issue (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-22 22:53:27 +00:00
Lionel Landwerlin 118a8c7587 anv: setup BO flags at state_pool/block_pool creation
This will allow to set the flags on any anv_bo created/filled from a
state pool or block pool later.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-22 22:53:27 +00:00
Kristian H. Kristensen 24609377f9 intel/genxml: Add helpers for determining field type
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-21 11:15:06 -08:00
Matt Turner beaea7abfa i965/fs: Check ADD/MAD with immediates in satprop unit test
The gen had to be changed from 4 to 6 so that we could test MAD, which
is new on Gen6.

mad_imm_float_neg_mov_sat tests the case fixed by the previous commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-11-21 10:13:07 -08:00
Matt Turner a05af1f7b8 i965/fs: Handle negating immediates on MADs when propagating saturates
MADs don't take immediate sources, but we allow them in the IR since it
simplifies a lot of things. I neglected to consider that case.

Fixes: 4009a9ead4 ("i965/fs: Allow saturate propagation to propagate
                      negations into MADs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103616
Reported-and-Tested-by: Ruslan Kabatsayev <b7.10110111@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-11-21 10:13:07 -08:00
Tapani Pälli 6236ffeb83 intel: fix disasm_info memory leaks
Fixes: 4f82b17287 ("i965: Rewrite disassembly annotation code")
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-11-21 08:36:43 +02:00
Jason Ekstrand 1eab327ba7 i965: Stop including brw_cfg.h in brw_disasm_info.h
The brw_disasm_info header is included by certain tools in order to get
shader assembly from binaries so it's a semi-external header.  Including
brw_cfg.h also pulls in brw_shader.h so you end up getting quite a bit
of our back-end compiler internals.  Instead, make the couple of forward
declarations we need and make the header more stand-alone.  This fixes
the meson build.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 4f82b17287
2017-11-17 21:51:16 -08:00
Andres Gomez 1866f7aee5 i965: Correct disasm_info usage in eu_validate test
Fixes: 4f82b17287 ("i965: Rewrite disassembly annotation code")

Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-11-18 03:07:06 +02:00
Matt Turner 821ec473a8 i965: Rename intel_asm_annotation -> brw_disasm_info
It was the only file named intel_* in the compiler.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-17 12:14:38 -08:00
Matt Turner 4f82b17287 i965: Rewrite disassembly annotation code
The old code used an array to store each "instruction group" (the new,
better name than the old overloaded "annotation"), and required a
memmove() to shift elements over in the array when we needed to split a
group so that we could add an error message. This was confusing and
difficult to get right, not the least of which was  because the array
has a tail sentinel not included in .ann_count.

Instead use a linked list, a data structure made for efficient
insertion.

Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-17 12:14:38 -08:00
Matt Turner f80e97346b i965: Simplify annotation_insert_error()
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-17 12:14:38 -08:00
Matt Turner f4276ef7ef i965: Move common code out of #ifdef
I'm going to change the call in a later patch and with the difference in
indentation level it wasn't immediately obvious that the calls were
identical.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-17 12:14:38 -08:00
Jason Ekstrand a07f7b2619 anv/cmd_buffer: Take bo_offset into account in fast clear state addresses
Otherwise, if the image is not bound to the start of the buffer, we're
going to be reading and writing its fast clear state in the wrong spot.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-17 11:32:21 -08:00
Jason Ekstrand a6cc361e5f anv/cmd_buffer: Advance the address when initializing clear colors
Found by inspection

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-17 11:32:21 -08:00
Kenneth Graunke f274687413 genxml: Fix PIPELINE_SELECT on G45/Ironlake.
Original 965 sets bits 28:27 to 0, while G45 and later set it to 1.

Note that the G45 docs are incorrect in this regard - see the DevCTG+
note in the Ironlake PRMs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-16 11:01:50 -08:00
Kenneth Graunke e48cc01be9 intel: Drop mtypes.h include from brw_compiler.h.
This isn't necessary and causes trouble for a project I'm working on.
2017-11-15 09:37:32 -08:00
Kenneth Graunke ff964916dc i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code.
We use the same hardware mechanism for both atomic counters and SSBO
atomics, so there's really no benefit to maintaining separate code to
handle each case.  Instead, we can just use Rob's shiny new NIR pass to
convert atomic_uints to SSBOs, and delete piles of code.

The ssbo_start section of the binding table becomes a combined ABO and
SSBO section, with ABOs first, then SSBOs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-15 09:37:32 -08:00
Anuj Phogat 5d8164c428 anv/gen10: Enable float blend optimization
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-11-14 13:23:18 -08:00
Anuj Phogat 72a239266b intel/genxml: Add Cache Mode SubSlice Register to gen10.xml
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-11-14 13:23:18 -08:00
Anuj Phogat aacf1943c0 anv/gen10: Implement WaSampleOffsetIZ workaround
We already have this workaround in OpenGL driver.
See Mesa commit 3cf4fe2219.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
2017-11-14 13:23:18 -08:00
Matt Turner a31d038208 Revert "intel/fs: Use a pure vertical stride for large register strides"
This reverts commit e8c9e65185.

With the actual bug fixed (by commit 6ac2d16901), this is not
necessary. I'm doubtful of its correctness in any case.
2017-11-14 11:24:08 -08:00
Matt Turner 6ac2d16901 i965/fs: Fix extract_i8/u8 to a 64-bit destination
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.

For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.

Fixes the following test on CHV, BXT, and GLK:
   KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-14 10:56:18 -08:00
Matt Turner cfcfa0b9cd i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK
Fixes the following tests on CHV, BXT, and GLK:
    KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
    dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
2017-11-14 10:56:18 -08:00
Jason Ekstrand bc933d0e84 intel/blorp: Make the MOCS setting part of blorp_address
This makes our MOCS settings significantly more flexible.

Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-13 19:40:10 -08:00
Jason Ekstrand deec84fd77 anv/blorp: Add a device parameter to blorp_surf_for_anv_image
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-13 19:40:09 -08:00
Jason Ekstrand 4639cc716e intel/blorp: Use mocs.tex for depth stencil
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-13 19:39:57 -08:00
Kenneth Graunke 866158b4b6 intel/tools/error: Decode compute shaders.
This is a bit more annoying than your average shader - we need to look
at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over
to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then
hop over to the instruction buffer to decode the program.

Now that we store all the buffers before decoding, we can actually do
this fairly easily.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 17:11:02 -08:00
Kenneth Graunke 7049c38655 intel/tools/error: Use do-while for field iterator loops.
while loops skip the first field of the instruction/structure, which
is not what the code intended.  It works out because the field we're
looking for doesn't happen to be first, but we ought to do it right
regardless.

Found while writing the next patch, where Kernel Start Pointer is
the first field of INTERFACE_DESCRIPTOR_DATA.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 17:11:02 -08:00
Kenneth Graunke 8b749ee0ea intel/tools/error: Decode shaders while decoding batch commands.
This makes aubinator_error_decode's shader dumping work like aubinator.
Instead of printing them after the fact, it prints them right inside the
3DSTATE_VS/HS/DS/GS/PS packet that references them.  This saves you the
effort of cross-referencing things and jumping back and forth.

It also reduces a bunch of book-keeping, and eliminates the limitation
that we could only handle 4096 programs.  That code was also broken and
failed to print any shaders if there were under 4096 programs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 17:11:02 -08:00
Kenneth Graunke 4979bf2728 intel/tools/error: Save error state sections and decode them later.
This lets us complete parsing and storing of each buffer's data before
we begin decoding the batchbuffer.  This makes it possible to inspect
the state buffer and program buffer, so we can properly decode any
indirect state or shader programs.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:11:02 -08:00
Kenneth Graunke eb8ad56ed2 intel/tools/error: Fix null termination of ring name string.
Ported from intel_error_decode.  We don't want to run off the end.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:11:01 -08:00
Kenneth Graunke ac17b38e79 intel/tools/error: Drop unused MAX_RINGS #define.
Dead code.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:11:01 -08:00
Kenneth Graunke 596e860317 intel/tools/error: Refactor buffer matching, add more buffers.
Based on a similar patch to intel_error_decode by Chris Wilson.

While we're de-duplicating the gtt_offset calculation, we can simplify
it to assume two hex digits are there - the kernel has done this since
v4.6, and we already require error states from v4.10.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:51 -08:00
Kenneth Graunke 4bb119f00b intel/tools/error: Only decode a few sections of error states.
These three are the only we can reasonably decode with genxml.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:38 -08:00
Kenneth Graunke 00981e7c47 intel/tools/error: Drop unused parameters from decode() helper.
Also change count from a pointer into a value.  We were supposed to
be resetting it to 0 (and failed to), but that's gone since we dropped
the pre-ascii85 handling.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:38 -08:00
Kenneth Graunke 1898bf11a8 intel/tools/error: Drop support for non-ascii85 encoded error states.
Error state files used to look like:

   render ring --- gtt_offset = 0x0e8f6000
   00000000 :  69040000
   00000004 :  79090000
   ...
   00007ffc :  00000000
   --- ringbuffer = 0x00001000

There were thousands of lines between sections.  The file format changed
with Kernel 4.10, and now has a single ascii85-encoded line following
each section heading.  This is much easier to parse.

There are a bunch of bugs in our handling of the old style format,
where we'd decode the wrong data, at the wrong time.  Fixing all of
these is going to be a giant pain.  It's also a lot of extra code
complexity.  In order to properly decode indirect state, or compute
shaders, we'll also need to parse data in advance of decoding, which
is going to be a giant pain with this ad-hoc "decode everywhere!"
mentality.  So, let's just drop support for the older file format.

This unfortunately requires an error state generated by Kernel 4.10 or
later.  That's probably not the end of the world, as we encourage users
to upgrade to the latest kernel when encountering GPU hangs anyway.  It
might be a giant pain for people with LTS kernels, though...

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:38 -08:00
Kenneth Graunke 53586f88d7 intel/tools/error: Do ascii85 decode first.
The dashes "---" may occur within an ascii85 block, but only an ascii85
block starts with ':' or '~'.

Ported from Chris Wilson's intel-gpu-tools commit:
bceec7e1d8a160226b783c6344eae8cbf4ece144

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:38 -08:00
Dylan Baker 49fa074726 meson: Don't build intel shared components by default
It's a neat idea, and still useful in some cases, but the intel common
code is used by i965 and anvil only, this is a little clearer.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-11-13 13:43:15 -08:00
Jason Ekstrand 93200ea26d aubinator: Don't skip the first field in each subgroup
The previous iteration algorithm would advance the field pointer right
after we advance the group.  This meant that you would end up with
skipping the first field of the group.  In the common case, where the
only field is a struct (e.g. 3DSTATE_VERTEX_BUFFERS), it would get
skipped entirely.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 07:37:23 -08:00
Jason Ekstrand 74a9e51696 intel/genxml: Delete empty groups
They serve no purpose other than to just fill empty space in the packet
so each dword has something.  Just disallowing empty groups is a bit
easier on some of the tools.  This does not change the generated packing
headers in any way.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 07:37:23 -08:00
Jason Ekstrand 54a6f7eaca anv: Don't crash on invalid heap sizes when the PCI ID is overriden 2017-11-13 07:37:23 -08:00
Kenneth Graunke 9a0465b3a3 intel/tools: Fix detection of enabled shader stages.
We renamed "Function Enable" to "Enable", which broke our detection
of whether shaders are enabled or not.  So, we'd see a bunch of HS/DS
packets with program offsets of 0, and think that was a valid TCS/TES.

Fixes: c032cae9ff (genxml: Rename "Function Enable" to "Enable".)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-12 00:16:40 -08:00
Dylan Baker 854455498c autotools: Set C++ visibility flags on Intel
These flags are set for C sources, but not C++. This causes symbol
visibility leaks from the C++ parts of the Intel compiler.

Fixes: 700bebb958 ("i965: Move the back-end compiler to src/intel/compiler")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-11-10 09:41:55 -08:00
Chad Versace cd6f79a71d anv/meson: Generate dev_icd.json
I tested this in a setup where the builddir was outside of the srcdir.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-11-09 16:29:33 -08:00
Chad Versace b7441ef252 anv: Fix architecture in intel_icd.{arch}.json
Use the host arch, not the target arch. In Meson and in recent
Autotools, the host arch is where the binary will be used. The target
arch is useful only when compiling a compiler.

See: http://mesonbuild.com/Cross-compilation.html
See: https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html
Reported-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-11-09 16:29:31 -08:00
Chad Versace 2a4798ad98 anv: Refactor anv_GetImageSubresourceLayout()
Its helper function, anv_surface_get_subresource_layout(), was not very
helpful. So fold it into the main function.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace 69e3f0b02e anv/image: Refactor choice of isl_tiling_flags_t
Instead of choosing the tiling flags inside make_surface(), which is
called once per aspect in a loop, and which chooses the same tiling for
each aspect, choose the tiling flags exactly once before entering the
aspect loop.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace 7bb4387105 anv: Refactor anv_get_format_plane() - explicit unsupported
The same local variable, 'plane_format', was returned on success *and*
failure. Be more explicit in distinguishing the two cases: return
'plane_format' on success and return 'unsupported' on failure.

This simplifies the diff in upcoming patches for
VK_EXT_image_drm_format_modifier.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace 3ee7f4bc2f anv: Remove anv_physical_device_get_format_properties()
Fold its body into its sole caller,
anv_GetPhysicalDeviceFormatProperties().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00