Commit Graph

4023 Commits

Author SHA1 Message Date
Jason Ekstrand 8993e0973f intel/nir: Drop an unneeded lower_constant_initializers call
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed.  We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand fa4824c1db intel/debug: Add a debug flag to force software fp64
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Ian Romanick 55e6454d5e intel/fs: Fix extract_u8 of an odd byte from a 64-bit integer
In the old code, we would generate the exact same instruction for
extract_u8(some_u64, 0) and extract_u8(some_u64, 1).  The mask-a-word
trick only works for even numbered bytes.

This fixes the (new) piglit test
tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test.

v2: Use a SHR instead of an AND.  This saves an instruction compared to
using two moves.  Suggested by Jason.

Fixes: 6ac2d16901 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:45 -08:00
Ian Romanick 4aaf139ea4 intel/fs: nir_op_extract_i8 extracts a byte, not a word
Fixes: 6ac2d16901 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:42 -08:00
Ian Romanick bbf20a1ca3 intel/compiler: Silence unused parameter warning in brw_interpolation_map.c
The parameter is never used, and it's not part of a common interface
idiom.  Remove it.

src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’:
src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
                             const struct gen_device_info *devinfo)
                                                           ^~~~~~~

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:36 -08:00
Ian Romanick dea19138dd intel/compiler: Silence many unused parameter warnings in brw_eu.h
In file included from src/intel/compiler/brw_eu_util.c:34:0:
src/intel/compiler/brw_eu.h: In function ‘brw_message_desc_header_present’:
src/intel/compiler/brw_eu.h:288:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_desc_header_present(const struct gen_device_info *devinfo,
                                                               ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc’:
src/intel/compiler/brw_eu.h:296:51: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_ex_desc(const struct gen_device_info *devinfo,
                                                   ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc_ex_mlen’:
src/intel/compiler/brw_eu.h:303:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_message_ex_desc_ex_mlen(const struct gen_device_info *devinfo,
                                                           ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:337:68: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_binding_table_index(const struct gen_device_info *devinfo,
                                                                    ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_sampler’:
src/intel/compiler/brw_eu.h:344:56: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_sampler(const struct gen_device_info *devinfo, uint32_t desc)
                                                        ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_return_format’:
src/intel/compiler/brw_eu.h:371:62: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_sampler_desc_return_format(const struct gen_device_info *devinfo,
                                                              ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:405:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_dp_desc_binding_table_index(const struct gen_device_info *devinfo,
                                                               ^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_desc’:
src/intel/compiler/brw_eu.h:754:41: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
                                unsigned exec_size, /**< 0 for SIMD4x2 */
                                         ^~~~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_float_desc’:
src/intel/compiler/brw_eu.h:775:47: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
                                      unsigned exec_size,
                                               ^~~~~~~~~

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-06 08:35:31 -08:00
Timothy Arceri 81ee2cd8ba glsl: rename is_record() -> is_struct()
Replace was done using:
find ./src -type f -exec sed -i -- \
's/is_record(/is_struct(/g' {} \;

Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Caio Marcelo de Oliveira Filho 69cc6272fb anv: Implement VK_EXT_external_memory_host
v2: Ignore the import if handleType == 0. (Jason)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 12:59:50 -08:00
Jason Ekstrand 43f40dc7cb anv: Implement VK_EXT_inline_uniform_block
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 61e009d2c4 spirv: Use the same types for resource indices as pointers
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver.  This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand c520f4dec9 anv: Add a concept of a descriptor buffer
This buffer goes along side the CPU data structure and may contain
pointers, bindless handles, or any other descriptor information.
Currently, all descriptors are size zero and nothing goes in the buffer
but this commit sets up the framework we will need later.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 5c30fffeec anv: Take references to push descriptor set layouts
Technically, descriptor set layouts aren't required to survive past the
function they're passed into so we need to reference them.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 8ab95b849e anv: Refactor descriptor pushing a bit
Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us.  Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand cab064bc10 anv: drop add_var_binding from anv_nir_apply_pipeline_layout.c
It has exactly one caller.  Just inline it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 49cf61c6aa anv: Clean up descriptor set layouts
The descriptor set layout code in our driver has undergone many changes
over the years.  Some of the fields which were once essential are now
useless or nearly so.  The has_dynamic_offsets field was completely
unused accept for the code to set and hash it.  The per-stage indices
were only being used to determine if a particular binding had images,
samplers, etc.  The fact that it's per-stage also doesn't matter because
that binding should never be accessed by a shader of the wrong stage.

This commit deletes a pile of cruft and replaces it all with a
descriptive bitfield which states what a particular descriptor contains.
This merely describes the data available and doesn't necessarily dictate
how it will be lowered in anv_nir_apply_pipeline_layout.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 4c50b7c92c anv: Count image param entries rather than images
This is what we're actually storing in the descriptor set and consuming
when we bind surface states.  This commit renames image_count to
image_param_count a few places and moves the decision to not count image
params on gen9+ into anv_descriptor_set.c when we build the layout.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 3822c7495a anv: Stop allocating buffer views for dynamic buffers
We emit the surface states for those on-the-fly so we don't need the
buffer view.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 8c6d410a50 anv: Rework arguments to anv_descriptor_set_write_*
Make them all take a device followed by a set.  This is consistent
with how the actual Vulkan entrypoint parameters are laid out.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand 5b7a9e7398 anv/descriptor_set: Refactor alloc/free of descriptor sets
This commit just puts the free list code together as part of the pool
instead of having it inlined into the descriptor set create code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Eric Engestrom 3d4238d26c anv: use the platform defines in vk.xml instead of hard-coding them
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 11:57:10 +00:00
Lionel Landwerlin e21c201c96 anv: update supported patch version
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-03-05 10:39:17 +00:00
Tapani Pälli 3bb8768b9d anv: toggle on support for VK_EXT_ycbcr_image_arrays
We already propagate coord_components correctly and did not have
layer restrictions for ycbcr formats.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:39:17 +00:00
Tapani Pälli 33bf3d510c anv: retain the is_array state in create_plane_tex_instr_implicit
This does not seem to fix anything ATM but is the right thing todo.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: f3e91e78a3 ("anv: add nir lowering pass for ycbcr textures")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:38:31 +00:00
Jason Ekstrand 0010d0348a anv/pipeline: Drop anv_fill_binding_table
We zero out the prog data anyway and, now that bias is always zero, this
function is accomplishing nothing.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-04 23:56:40 +00:00
Jason Ekstrand 65ee5cc0da anv: Use an actual binding for gl_NumWorkgroups
This commit moves our handling of gl_NumWorkgroups over to work like our
handling of other special bindings in the Vulkan driver.  We give it a
magic descriptor set number and teach emit_binding_tables to handle it.
This is better than the bias mechanism we were using because it allows
us to do proper accounting through the bind map mechanism.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-04 23:56:40 +00:00
Jason Ekstrand 5c96120b5c intel,nir: Lower TXD with min_lod when the sampler index is not < 16
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-04 23:56:39 +00:00
Jason Ekstrand 5049fbddb4 anv: Count surfaces for non-YCbCr images in GetDescriptorSetLayoutSupport
We were accidentally not counting those surfaces

Fixes: ddc4069122 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-04 23:56:39 +00:00
Sagar Ghuge e551040c60 nir/glsl: Add another way of doing lower_imul64 for gen8+
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.

Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended

v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
    2) Add lower_mul_2x32_64 flag (Matt Turner)
    3) Remove associative property as bit size is different (Connor
       Abbott)

v3: Fix indentation and variable naming convention (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Mauro Rossi ec0f465bc5 android: anv: fix libexpat shared dependency
Fixes undefined reference building errors for XML_* functions

Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
2019-03-04 20:53:59 +01:00
Mauro Rossi 14e7e26a09 android: anv: fix generated files depedencies (v2)
Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies
Rename the variable labels according to targets and python scripts
Align the building rules as per Automake for simplification

Fixes building errors during rebuils due to missing dependencies

(v2) Fixed a missing $(VULKAN_API_XML) reference

Fixes: 9a508b7 ("android: anv/extensions: fix generated sources build")
Fixes: dd088d4bec ("anv/extensions: Generate a header file with extension tables")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
2019-03-04 20:53:51 +01:00
Jordan Justen 10c5579921
intel/compiler: Move int64/doubles lowering options
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Ian Romanick d1d56f5f9a intel/fs: Don't assert on b2f with a saturate modifier
This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7725d60938 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
2019-03-02 13:58:50 -08:00
Lionel Landwerlin 32ffd90002 anv: add support for INTEL_DEBUG=bat
As requested by Ken ;)

v2: Also decode simple batches (Caio)
    Fix u_vector usage issues (Lionel)

v3: Make binding/instruction/state/surface available (Lionel)

v4: Going through device pools for simple batches (Lionel)
    Centralize search BO callbacks into anv_device.c (Lionel)

v5: Clear decoded batch buffer var after use (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-02 12:53:21 +00:00
Matt Turner e0148bbcfd
intel/compiler: Add commas on final values of compaction table arrays
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-03-01 13:56:25 -08:00
Ian Romanick 1edf67fc3f intel/fs: Generate if instructions with inverted conditions
Per-platform results were all over the place, so I have included all the
results here.  There is an important note at the bottom of the commit
message.

Skylake
total instructions in shared programs: 15184683 -> 15184679 (<.01%)
instructions in affected programs: 2786 -> 2782 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 370961367 -> 370961173 (<.01%)
cycles in affected programs: 205867 -> 205673 (-0.09%)
helped: 5
HURT: 1
helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16
helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -93.01 28.34
95% mean confidence interval for cycles %-change: -0.82% 0.08%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 15465366 -> 15465362 (<.01%)
instructions in affected programs: 2799 -> 2795 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 410938419 -> 410938531 (<.01%)
cycles in affected programs: 566028 -> 566140 (0.02%)
helped: 18
HURT: 17
helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1
helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01%
HURT stats (abs)   min: 1 max: 12 x̄: 10.29 x̃: 12
HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09%
95% mean confidence interval for cycles value: 0.31 6.09
95% mean confidence interval for cycles %-change: -0.10% 0.05%
Inconclusive result (%-change mean confidence interval includes 0).

Haswell
total instructions in shared programs: 13749760 -> 13749759 (<.01%)
instructions in affected programs: 2241 -> 2240 (-0.04%)
helped: 1
HURT: 0

total cycles in shared programs: 385398913 -> 385398363 (<.01%)
cycles in affected programs: 554914 -> 554364 (-0.10%)
helped: 31
HURT: 1
helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6
helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05%
HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -45.88 11.51
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 180663626 -> 180663881 (<.01%)
cycles in affected programs: 472350 -> 472605 (0.05%)
helped: 15
HURT: 30
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 8 max: 10 x̄: 9.00 x̃: 9
HURT stats (rel)   min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for cycles value: 4.21 7.12
95% mean confidence interval for cycles %-change: 0.05% 0.08%
Cycles are HURT.

Sandy Bridge
total cycles in shared programs: 154568664 -> 154569225 (<.01%)
cycles in affected programs: 356486 -> 357047 (0.16%)
helped: 1
HURT: 31
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
HURT stats (abs)   min: 4 max: 33 x̄: 18.16 x̃: 8
HURT stats (rel)   min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10%
95% mean confidence interval for cycles value: 12.19 22.87
95% mean confidence interval for cycles %-change: 0.10% 0.16%
Cycles are HURT.

Iron Lake
total instructions in shared programs: 8206589 -> 8206565 (<.01%)
instructions in affected programs: 3024 -> 3000 (-0.79%)
helped: 12
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.82% -0.77%
Instructions are helped.

total cycles in shared programs: 187657428 -> 187656228 (<.01%)
cycles in affected programs: 95748 -> 94548 (-1.25%)
helped: 12
HURT: 0
helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100
helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21%
95% mean confidence interval for cycles value: -113.27 -86.73
95% mean confidence interval for cycles %-change: -1.43% -1.11%
Cycles are helped.

GM45
total instructions in shared programs: 5037569 -> 5037557 (<.01%)
instructions in affected programs: 1521 -> 1509 (-0.79%)
helped: 6
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.83% -0.75%
Instructions are helped.

total cycles in shared programs: 128101478 -> 128100758 (<.01%)
cycles in affected programs: 52746 -> 52026 (-1.37%)
helped: 6
HURT: 0
helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120
helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41%
95% mean confidence interval for cycles value: -120.00 -120.00
95% mean confidence interval for cycles %-change: -1.70% -1.12%
Cycles are helped.

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f
with a b2f(!(a || b))") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071022 -> 15089710 (0.12%)
instructions in affected programs: 1022219 -> 1040907 (1.83%)
helped: 1
HURT: 3937
helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41
helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01%
HURT stats (abs)   min: 1 max: 256 x̄: 4.76 x̃: 4
HURT stats (rel)   min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60%
95% mean confidence interval for instructions value: 4.56 4.93
95% mean confidence interval for instructions %-change: 2.54% 2.64%
Instructions are HURT.

total cycles in shared programs: 369777134 -> 370092923 (0.09%)
cycles in affected programs: 17516573 -> 17832362 (1.80%)
helped: 115
HURT: 3624
helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28
helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65%
HURT stats (abs)   min: 1 max: 12640 x̄: 89.71 x̃: 54
HURT stats (rel)   min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52%
95% mean confidence interval for cycles value: 75.21 93.71
95% mean confidence interval for cycles %-change: 4.43% 4.64%
Cycles are HURT.

total spills in shared programs: 9450 -> 9442 (-0.08%)
spills in affected programs: 166 -> 158 (-4.82%)
helped: 2
HURT: 0

total fills in shared programs: 21115 -> 21094 (-0.10%)
fills in affected programs: 438 -> 417 (-4.79%)
helped: 2
HURT: 0

LOST:   1
GAINED: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 7725d60938 intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))
Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a))
maps -1 => 0.0 and 0 => 1.0.  This is equivalent to 1.0 +
float(boolBitsToInt(a)).  On Intel GPUs, ADD is one of the few
instructions that can type-convert during write to destination, so we
can achieve this in a single instruction:

    add    g47F, g26D, 1D

v2: Fix swizzles.

v3: Fix typos in comments.  Noticed by Ken.

All Gen6+ platforms had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185583 -> 15184683 (<.01%)
instructions in affected programs: 239389 -> 238489 (-0.38%)
helped: 899
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09%
95% mean confidence interval for instructions value: -1.01 -0.99
95% mean confidence interval for instructions %-change: -0.51% -0.48%
Instructions are helped.

total cycles in shared programs: 370964249 -> 370961508 (<.01%)
cycles in affected programs: 1487586 -> 1484845 (-0.18%)
helped: 420
HURT: 268
helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6
helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41%
HURT stats (abs)   min: 1 max: 230 x̄: 24.90 x̃: 10
HURT stats (rel)   min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52%
95% mean confidence interval for cycles value: -7.61 -0.36
95% mean confidence interval for cycles %-change: -0.44% -0.02%
Cycles are helped.

No changes on Iron Lake or GM45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick cb3e21cd19 intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+
Instead of emitting ~(a & b), emit (~a | ~b) since logical-not of
operands is free on Gen8+.

v2: Fix swizzles.  Fix types for cmod propagation.

v3: Simplify logic for inverting source of inot(ixor(a, b)).  Suggested
by Ken.

Skylake and Broadwell had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185593 -> 15185583 (<.01%)
instructions in affected programs: 5673 -> 5663 (-0.18%)
helped: 12
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -1.66 0.13
95% mean confidence interval for instructions %-change: -2.60% -0.15%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 370977726 -> 370964249 (<.01%)
cycles in affected programs: 869987 -> 856510 (-1.55%)
helped: 15
HURT: 2
helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16
helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53%
HURT stats (abs)   min: 14 max: 42 x̄: 28.00 x̃: 28
HURT stats (rel)   min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13%
95% mean confidence interval for cycles value: -1654.87 69.34
95% mean confidence interval for cycles %-change: -2.29% -0.23%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 8eb36c9129 intel/fs: Emit logical-not of operands on Gen8+
On Gen8+ specifying negation of a logical operation such as AND actually
performs a logical-not.  Take advantage of this to generate fewer
instructions.

v2: Major rebase.  Use nir_src_as_alu_instr.  Fix swizzle handling.

No changes on any pre-Gen8 platform.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15466902 -> 15466274 (<.01%)
instructions in affected programs: 1262953 -> 1262325 (-0.05%)
helped: 682
HURT: 4
helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04%
HURT stats (abs)   min: 1 max: 62 x̄: 17.50 x̃: 3
HURT stats (rel)   min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10%
95% mean confidence interval for instructions value: -1.10 -0.73
95% mean confidence interval for instructions %-change: -0.19% -0.15%
Instructions are helped.

total cycles in shared programs: 410996093 -> 410950440 (-0.01%)
cycles in affected programs: 144389048 -> 144343395 (-0.03%)
helped: 519
HURT: 51
helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140
helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03%
HURT stats (abs)   min: 1 max: 4060 x̄: 167.90 x̃: 22
HURT stats (rel)   min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25%
95% mean confidence interval for cycles value: -97.16 -63.02
95% mean confidence interval for cycles %-change: -0.32% -0.13%
Cycles are helped.

total spills in shared programs: 95311 -> 95329 (0.02%)
spills in affected programs: 881 -> 899 (2.04%)
helped: 0
HURT: 4

total fills in shared programs: 93629 -> 93634 (<.01%)
fills in affected programs: 794 -> 799 (0.63%)
helped: 1
HURT: 2

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 06eaaf2de9 intel/fs: Refactor ALU source and destination handling to a separate function
Other places will need to do this soon to properly handle source
swizzles.  The patch looks a little odd, but the change is pretty
straight forward.  All of the swizzle and mask handling is moved out,
but the code for handling move instructions and vecN instructions
remains in nir_emit_alu.

I'm not terribly pleased with the "need_dest" parameter, but
get_nir_dest is (somewhat surprisingly) destructive.  I am open to
suggestions of alternatives.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick fb3ca9109c intel/fs: Handle OR source modifiers in algebraic optimization
Found by inspection.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick c9d5bd050c intel/fs: Relax type matching rules in cmod propagation from MOV instructions
To allow cmod propagation from a MOV in a sequence like:

    and(16)         g31<1>UD       g20<8,8,1>UD   g22<8,8,1>UD
    mov.nz.f0(16)   null<1>F       g31<8,8,1>D

A similar change to the vec4 backend had no effect.

Somewhere between c1ec582059 and 40fc4b5acd (1,094 commits) the
effectiveness of this patch diminished, and as of commit d7e0d47b9d
(nir: Add a bunch of b2[if] optimizations) this optimization no longer
has any effect on any platform.

A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a
logic result on Gen8+," generates some instruction sequences that
require this change in order for cmod propagation to make progress.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick d2056ab993 intel/vec4: Emit constants for some ALU sources as immediate values
In some cases of flow control, the constant propagation is not able to
determine that the source of an instruction must be a constant value.
When we still have NIR SSA values, we can easily determine this.  Emit
the immediate value during code generation to possible avoid spurious
loads of constants into registers.

I wrote this patch to prevent a couple trivial regressions in vec4
shaders caused by "nir/algebraic: Replace i2b used by bcsel or
if-statement with comparison".  The final result was quite a bit better
than that...

No shader-db changes on any Gen8+ platform.

v2: Assert that we never get a negation source modifier on Gen8+.
Suggested by Ken.  This should never happen because we don't normally
use vec4 for Gen8+ (requires and environment variable to force it), and
there's no code to generate these negations.  Still, erring on the side
of caution is better.

Haswell
total instructions in shared programs: 13776218 -> 13764783 (-0.08%)
instructions in affected programs: 663931 -> 652496 (-1.72%)
helped: 3495
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49%
HURT stats (abs)   min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel)   min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24%
95% mean confidence interval for instructions value: -3.39 -3.15
95% mean confidence interval for instructions %-change: -1.84% -1.75%
Instructions are helped.

total cycles in shared programs: 386818984 -> 386511910 (-0.08%)
cycles in affected programs: 20379636 -> 20072562 (-1.51%)
helped: 3052
HURT: 476
helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6
helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69%
HURT stats (abs)   min: 2 max: 416 x̄: 62.76 x̃: 24
HURT stats (rel)   min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18%
95% mean confidence interval for cycles value: -115.57 -58.51
95% mean confidence interval for cycles %-change: -0.93% -0.73%
Cycles are helped.

total spills in shared programs: 100482 -> 100480 (<.01%)
spills in affected programs: 79 -> 77 (-2.53%)
helped: 3
HURT: 1

total fills in shared programs: 96883 -> 96877 (<.01%)
fills in affected programs: 85 -> 79 (-7.06%)
helped: 4
HURT: 0

Ivy Bridge
total instructions in shared programs: 12000562 -> 11990113 (-0.09%)
instructions in affected programs: 572581 -> 562132 (-1.82%)
helped: 3106
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49%
95% mean confidence interval for instructions value: -3.49 -3.23
95% mean confidence interval for instructions %-change: -1.91% -1.81%
Instructions are helped.

total cycles in shared programs: 180958504 -> 180664500 (-0.16%)
cycles in affected programs: 19991810 -> 19697806 (-1.47%)
helped: 2654
HURT: 486
helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6
helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68%
HURT stats (abs)   min: 2 max: 396 x̄: 59.18 x̃: 24
HURT stats (rel)   min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16%
95% mean confidence interval for cycles value: -125.62 -61.64
95% mean confidence interval for cycles %-change: -0.76% -0.56%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10842336 -> 10835438 (-0.06%)
instructions in affected programs: 395340 -> 388442 (-1.74%)
helped: 1926
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2
helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42%
95% mean confidence interval for instructions value: -3.73 -3.43
95% mean confidence interval for instructions %-change: -1.84% -1.72%
Instructions are helped.

total cycles in shared programs: 154590074 -> 154569050 (-0.01%)
cycles in affected programs: 8159932 -> 8138908 (-0.26%)
helped: 1670
HURT: 228
helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6
helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28%
HURT stats (abs)   min: 2 max: 1798 x̄: 40.58 x̃: 14
HURT stats (rel)   min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31%
95% mean confidence interval for cycles value: -13.51 -8.64
95% mean confidence interval for cycles %-change: -0.60% -0.46%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8212357 -> 8206587 (-0.07%)
instructions in affected programs: 323664 -> 317894 (-1.78%)
helped: 1457
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3
helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44%
95% mean confidence interval for instructions value: -4.14 -3.78
95% mean confidence interval for instructions %-change: -1.93% -1.78%
Instructions are helped.

total cycles in shared programs: 187668016 -> 187657422 (<.01%)
cycles in affected programs: 14856234 -> 14845640 (-0.07%)
helped: 1372
HURT: 83
helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6
helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08%
HURT stats (abs)   min: 2 max: 14 x̄: 3.20 x̃: 2
HURT stats (rel)   min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for cycles value: -7.65 -6.91
95% mean confidence interval for cycles %-change: -0.11% -0.10%
Cycles are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:41:46 -08:00
Eric Engestrom 2793417ec6 anv: fix typo
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom 258e463db5 anv: remove spaces around kwargs assignment
pylint complains:
> C0326: No space allowed around keyword argument assignment

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom 7b704fd2fd anv: drop unused parameter
I'm guessing a previous version of this script used an index-based map
of entrypoints, but that's not the case anymore.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom b503d4e458 anv: simplify chained comparison
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Jason Ekstrand e8f863e718 intel/compiler: Re-prefix non-logical surface opcodes with VEC4
The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so
we no longer need the non-logical opcodes there.  Prefix them VEC4 so
it's clear that they're only used by the vec4 back-end.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand 95ae400abc intel/schedule_instructions: Move some comments
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand aeaba24fcb intel/compiler: Drop unused surface opcodes
The unused typed surface read/write support in the vec4 back-end has
been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all
image and buffer ops.  There's no reason to keep these opcodes around
anymore.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand a04c737215 intel/fs: Get rid of the IMAGE_SIZE opcode
Since switching to SHADER_OPCODE_SEND for image operations, we no longer
need the non-logical opcode.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand 10b7d14c31 intel/vec4: Drop dead code for handling typed surface messages
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand 9d437f9482 intel/fs: Drop the fs_surface_builder
All of the actual abstraction (except possibly setting size_written)
happens as part of the logical opcodes.  The only thing that the surface
builder is providing at this point is extra levels of functions to call
through.  I'm going to be adding bindless image support soon and all the
extra abstraction here is just getting in the way.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand 494a0543e6 intel/fs: Re-order logical surface arguments
It makes more sense to start at the surface then move on to the address
and then the data.  Also, this is a really good test of whether or not
we got all the places that use the sources by explicit integer number.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand 94f8fd9a0c intel/fs: Add an enum type for logical sampler inst sources
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Lionel Landwerlin 6e184147dd intel/compiler: use correct swizzle for replacement
The optimization in 4cd1a0be76 introduced a replacement of :

cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D
...
cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D

By :

cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D
...
mov(8) vgrf11.y:D, vgrf15.yyyy:D

The first cmp instruction is storing in x while the second mov is
sourcing from y. We need to take into account where the replacement on
the scan_inst destination is going to store thing so that the
replacement mov can source things from the correct location.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4cd1a0be76 ("i965/vec4: Propagate conditional modifiers from more compares to other compares")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-27 20:06:42 +00:00
Tapani Pälli a3c366c4b2 android: add liblog to libmesa_intel_common build
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-27 08:53:09 +02:00
Kasireddy, Vivek 7cab8d3661 i965: Add support for sampling from XYUV images
Add support to the i965 DRI driver to sample from XYUV8888 buffers.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:52 +00:00
Jason Ekstrand c4fb6b0c81 intel/eu: Add an EOT parameter to send_indirect_[split]_message
For split indirect sends we have to put the EOT parameter in the
extended descriptor as well as the instruction itself so just calling
brw_inst_set_eot is insufficient.  Moving the EOT handling handling into
the send_indirect_[split]_message helper lets us handle it properly.
2019-02-25 11:35:12 -06:00
Lionel Landwerlin 30828f4646 intel/aub_viewer: silence more compiler warnings
format not a string literal and no format arguments.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-25 13:11:16 +00:00
Lionel Landwerlin 91df8b1780 intel/aub_viewer: silence compiler warning
buffer_addr may be used uninitialized.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-25 13:11:13 +00:00
Lionel Landwerlin f1da10e0c5 intel/aub_viewer: printout 48bits addresses
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-25 13:11:05 +00:00
Lionel Landwerlin 239b0d8570 Revert "anv: add support for INTEL_DEBUG=bat"
This reverts commit e4d88396d2.

Apologies, I pushed the wrong commit.
2019-02-24 01:06:39 +00:00
Lionel Landwerlin e4d88396d2 anv: add support for INTEL_DEBUG=bat
As requested by Ken ;)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-23 23:29:04 +00:00
Juan A. Suarez Romero 4f917e6a61 anv: advertise 8 subpixel precision bits
On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is
used to select between 8 bit subpixel precision (value 0) or 4 bit
subpixel precision (value 1). As this value is not set, means it is
taking the value 0, so 8 bit are used.

On the other side, in the Vulkan CTS tests, if the reference rasterizer,
which uses 8 bit precision, as it is used to check what should be the
expected value for the tests, is changed to use 4 bit as ANV was
advertising so far, some of the tests will fail.

So it seems ANV is actually using 8 bits.

v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason)

v3: use _8Bit definition as value (Jason)

v4: (by Jason)
anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect

This field was added on gen8 even though there's an identically defined
one in 3DSTATE_SF.

CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 17:53:55 +01:00
Juan A. Suarez Romero 3b423eeb2d genxml: add missing field values for 3DSTATE_SF
Fill out "Vertex Sub Pixel Precision Select" possible values.

CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 17:53:45 +01:00
Lionel Landwerlin 1d626fc028 intel: fix urb size for CFL GT1
Same 192Kb amount as SKL/KBL GT1 applies.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: de7ed0ba55 ("i965/CFL: Add PCI Ids for Coffee Lake.")
2019-02-22 11:53:49 +00:00
Samuel Iglesias Gonsálvez bd2c5a8203 isl: the display engine requires 64B alignment for linear surfaces
v2: Add PRM quote (Lionel)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-22 11:45:45 +00:00
Mauro Rossi 338dacc341 android: intel/isl: remove redundant building rules
Fixes the following building error:

including ./external/mesa/Android.mk ...
build/core/base_rules.mk:183: *** external/mesa/src/intel:
MODULE.TARGET.STATIC_LIBRARIES.libmesa_isl_tiled_memcpy already defined by external/mesa/src/intel.
make: *** [build/core/ninja.mk:164: out/build-android_x86_64.ninja] Error 1

ISL_TILED_MEMCPY_FILES is isl/isl_tiled_memcpy_normal.c
and that source file includes isl_tiled_memcpy.c source

Fixes: 96bb328 ("iris: add Android build")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-22 07:56:11 +02:00
Francisco Jerez 7272fe9c08 intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez e03be78252 intel/fs: Implement extended strides greater than 4 for IR source regions.
Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region.  The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.

An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.

This showed up as a regression from my commit cbea91eb57
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication.  CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering.  Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).

According to Jason, a nearly equivalent change had been committed
previously as e8c9e65185 and then (mistakenly?) reverted as
a31d038208.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez 7f9f6263c1 intel/fs: Cap dst-aligned region stride to maximum representable hstride value.
This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez e2f475ddff intel/fs: Lower integer multiply correctly when destination stride equals 4.
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez c3c27762f7 intel/fs: Exclude control sources from execution type and region alignment calculations.
Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass."
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Jordan Justen cd0ac3a6af
genxml: Remove extra space in gen4/45/5 field name
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 13:17:10 -08:00
Jordan Justen a9b0b72a78
genxml/gen_bits_header.py: Use regex to strip no alphanum chars
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 13:15:59 -08:00
Kenneth Graunke 286b8b8f99 iris: handle PatchVerticesIn as a system value. 2019-02-21 10:26:10 -08:00
Tapani Pälli 96bb328e9b iris: add Android build
Note that at least following additional libs/components require changes
since they refer to BOARD_GPU_DRIVERS variable which is used to select
the driver:

  - mixins
  - minigbm
  - libdrm
  - drm_gralloc

v2: (feedback by Gustaw Smolarczyk) Fix trailing \ in a few cases

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-21 10:26:10 -08:00
Lionel Landwerlin 89f03d1872 imgui: make sure our copy of imgui doesn't clash with others in the same process
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
+1-by: Mike Lothian <mike@fireburn.co.uk>
+1-by: Tapani Pälli <tapani.palli@intel.com>
+1-by: Eric Engestrom <eric.engestrom@intel.com>
+1-by: Yurii Kolesnykov <root@yurikoles.com>
+1-by: myfreeweb <greg@unrelenting.technology>
+1-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 18:06:05 +00:00
Lionel Landwerlin 51047cd2e8 build: move imgui out of src/intel/tools to be reused
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
+1-by: Mike Lothian <mike@fireburn.co.uk>
+1-by: Tapani Pälli <tapani.palli@intel.com>
+1-by: Eric Engestrom <eric.engestrom@intel.com>
+1-by: Yurii Kolesnykov <root@yurikoles.com>
+1-by: myfreeweb <greg@unrelenting.technology>
+1-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 18:06:05 +00:00
Alejandro Piñeiro 0629b2a462 nir, glsl: move pixel_center_integer/origin_upper_left to shader_info.fs
On GLSL that info is set as a layout qualifier when redeclaring
gl_FragCoord, so somehow tied to a specific variable. But in practice,
they behave as a global of the shader. On ARB programs they are set
using a global OPTION (defined at ARB_fragment_coord_conventions), and
on SPIR-V using ExecutionModes, that are also not tied specifically to
the builtin.

This patch moves that info from nir variable and ir variable to nir
shader and gl_program shader_info respectively, so the map is more
similar to SPIR-V, and ARB programs, instead of more similar to GLSL.

FWIW, shader_info.fs already had pixel_center_integer, so this change
also removes some redundancy. Also, as struct gl_program also includes
a shader_info, we removed gl_program::OriginUpperLeft and
PixelCenterInteger, as it would be superfluous.

This change was needed because recently spirv_to_nir changed the order
in which execution modes and variables are handled, so the variables
didn't get the correct values. Now the info is set on the shader
itself, and we don't need to go back to the builtin variable to set
it.

Fixes: e68871f6a ("spirv: Handle constants and types before execution
                   modes")

v2: (Jason)
   * glsl_to_nir: get the info before glsl_to_nir, while all the rest
     of the info gathering is happening
   * prog_to_nir: gather the info on a general info-gathering pass,
     not on variable setup.

v3: (Jason)
   * Squash with the patch that removes that info from ir variable
   * anv: assert that OriginUpperLeft is true. It should be already
     set by spirv_to_nir.
   * blorp: set origin_upper_left on its core "compile fragment
     shader", not just on some specific places (for this we added an
     helper on a previous patch).
   * prog_to_nir: no need to gather specifically this fragcoord modes
     as the full gl_program shader_info is copied.
   * spirv_to_nir: assert that we are a fragment shader when handling
     this execution modes.

v4: (reported by failing gitlab pipeline #18750)
   * state_tracker: update too due changes on ir.h/gl_program

v5:
   * blorp: minor change after change on previous patch
   * radeonsi: update due this change.

v6: (Timothy Arceri)
   * prog_to_nir: remove extra whitespace
   * shader_info: don't use :1 on origin_upper_left
   * glsl: program.fs.origin_upper_left/pixel_center_integer can be
     move out of the shader list loop
2019-02-21 11:47:59 +01:00
Alejandro Piñeiro 675eabb560 blorp: introduce helper method blorp_nir_init_shader
This initializes the nir shader that will be used by blorp. Right now
it doesn't do too much beyond calling nir_builder_init_simple_shader,
and setting a name. More stuff will be added on following patches.

v2: there is a case were it is used a VERTEX_SHADER (Alejandro)
2019-02-21 11:47:51 +01:00
Eric Engestrom a16c398668 anv: use anv_shader_bin_write_to_blob()'s return value
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 16:40:13 +00:00
Eric Engestrom d3115f34a6 anv: drop unused imports
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 14:28:55 +00:00
Eric Engestrom 8cbfcab425 anv: make sure the extensions stay sorted
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 14:28:55 +00:00
Eric Engestrom bc76ce1033 anv: sort vendors extensions after KHR and EXT
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 14:28:55 +00:00
Eric Engestrom 427aa9d154 anv: sort extensions alphabetically
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 14:28:55 +00:00
Tapani Pälli 886cee1f96 anv: anv: refactor error handling in anv_shader_bin_write_to_blob()
v2: blob manages error state internally, just return
    true if errors did not occur (Jason)

CID: 1442546
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 15:39:19 +02:00
Lionel Landwerlin f509213675 anv: implement VK_EXT_depth_clip_enable
A new extension allowing the user to explictly specify the clipping
behavior.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-20 09:57:58 +00:00
Samuel Iglesias Gonsálvez 63a919a3ce isl: remove the cache line size alignment requirement
The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.

Fixes CTS's CL#3500 test:

dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 08:28:31 +01:00
Matt Turner ac21dd4aee intel/compiler/test: Add unit test for mismatched signedness comparison
v2 (idr): Move adding the test to after adding the fix.  Reordering the
two commits prevents possible headaches for git-bisect with scripts that
always do 'ninja check'.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-15 11:11:02 -08:00
Matt Turner 2dff9a66b6 intel/compiler: Avoid propagating inequality cmods if types are different
v2: Fix silly bug in logic.  s/||/&&/

All but one of the affected shaders is in an Unreal4 demo.  The other is
in Tomb Raider.  All of the cases that Ian investigated appear to be
sequences like the following

    if (int(uint(some_float)) < 0) /* other relations too */
        ...

At least in Tomb Raider, it's not obvious that this sequence came from
the original shader.

In some of the Unreal demos, the shader contains code like

    if (int(uint(textureLod(...))) > 0)
        ...

which explicitly generates the offending sequence.

All Gen6+ platforms had similar results (Skylake shown):
total instructions in shared programs: 15437170 -> 15437187 (<.01%)
instructions in affected programs: 4492 -> 4509 (0.38%)
helped: 0
HURT: 17
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.57% 0.75%
Instructions are HURT.

total cycles in shared programs: 383007996 -> 383007992 (<.01%)
cycles in affected programs: 20542 -> 20538 (-0.02%)
helped: 6
HURT: 7
helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6
helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27%
95% mean confidence interval for cycles value: -3.30 2.69
95% mean confidence interval for cycles %-change: -0.19% 0.19%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: nagrigoriadis@gmail.com
Tested-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
2019-02-15 11:11:02 -08:00
Matt Turner e50db60d16 intel/compiler/test: Set devinfo->gen = 7
We emit an FBL instruction which only exists since Gen7. This prevents
the test from segfaulting when run with TEST_DEBUG=1.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-15 11:11:02 -08:00
Jason Ekstrand 367b0ede4d intel/fs: Bail in optimize_extract_to_float if we have modifiers
This fixes a bug in runscape where we were optimizing x >> 16 to an
extract and then negating and converting to float.  The NIR to fs pass
was dropping the negate on the floor breaking a geometry shader and
causing it to render nothing.

Fixes: 1f862e923c "i965/fs: Optimize float conversions of byte/word..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-02-14 23:02:44 -06:00
Jason Ekstrand 5064464931 intel/fs: Silence a compiler warning
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:47 -06:00
Jason Ekstrand 9b202239ba anv: Silence some compiler warnings in release builds
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:45 -06:00
Jason Ekstrand cd60c995a6 anv/blorp: Delete a pointless assert
Just a little higher up in the function we assert that the aspect masks
are actually equal so there's no reason for the weaker check.  Also, the
temporary variables were causing compiler warnings in release builds.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:42 -06:00
Kenneth Graunke 39aee57523 anv: Put MOCS in the correct location
My patch to switch from struct-based MOCS to numeric MOCS accidentally
divided all MOCS entries by 2 in the Vulkan driver.

MOCS on Gen9+ is just an array index into a table.  But in the hardware
packets, the index starts at bit 1.  So we need to shift it.

Fixes: 0b44644ca6 (genxml: Consistently use a numeric "MOCS" field)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-14 11:03:28 -08:00
Eric Engestrom f7c56475d2 anv/tests: compile to something sensible in release builds
assert()-based tests make no sense without asserts, so make sure asserts
are compiled in, even if the rest of the code has asserts turned off.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-14 12:52:34 +00:00
Eric Engestrom f1374805a8 drm-uapi: use local files, not system libdrm
There was an issue recently caused by the system header being included
by mistake, so let's just get rid of this include path and always
explicitly #include "drm-uapi/FOO.h"

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-14 11:20:00 +00:00
Eric Anholt 8f3694e1ab intel: Use the NIR lowering for isign.
Drops one instruction from fs-sign-int.shader_test.  No change in
shader-db due to it having 0 instances of sign(genIType).  This may hurt
isign64 if algebraic runs before int64 lowering, but I wasn't sure how to
mark the algebraic opt as "every bit size but 64".

v2: Update commit message about shader-db.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
2019-02-14 00:32:30 +00:00
Dylan Baker 279060cd32 meson: Add dependency on genxml to anvil
Currently the Intel "anvil" driver races with the generation of genxml
files, while i965 has an explicit dependency. This patch adds the same
dependency to anvil.

Fixes: d1992255bb
       ("meson: Add build Intel "anv" vulkan driver")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-13 22:01:00 +00:00
Juan A. Suarez Romero 1ad26f9417 anv/cmd_buffer: check for NULL framebuffer
This can happen when we record a VkCmdDraw in a secondary buffer that
was created inheriting from the primary buffer, but with the framebuffer
set to NULL in the VkCommandBufferInheritanceInfo.

Vulkan 1.1.81 spec says that "the application must ensure (using scissor
if neccesary) that all rendering is contained in the render area [...]
[which] must be contained within the framebuffer dimesions".

While this should be done by the application, commit 465e5a86 added the
clamp to the framebuffer size, in case of application does not do it.
But this requires to know the framebuffer dimensions.

If we do not have a framebuffer at that moment, the best compromise we
can do is to just apply the scissor as it is, and let the application to
ensure the rendering is contained in the render area.

v2: do not clamp to framebuffer if there isn't a framebuffer

v3 (Jason):
- clamp earlier in the conditional
- clamp to render area if command buffer is primary

v4: clamp also x and y to render area (Jason)

v5: rename used variables (Jason)

Fixes: 465e5a86 ("anv: Clamp scissors to the framebuffer boundary")
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-12 19:19:13 +01:00
Tapani Pälli 3da858a6b9 intel/compiler: add scale_factors to sampler_prog_key_data
Patch propagates given scale_factors to lowering options.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-12 08:42:25 +02:00
Francisco Jerez 374eb3cd6f intel/dump_gpu: Disambiguate between BOs from different GEM handle spaces.
This fixes a rather astonishing problem that came up while debugging
an issue in the Vulkan CTS.  Apparently the Vulkan CTS framework has
the tendency to create multiple VkDevices, each one with a separate
DRM device FD and therefore a disjoint GEM buffer object handle space.
Because the intel_dump_gpu tool wasn't making any distinction between
buffers from the different handle spaces, it was confusing the
instruction state pools from both devices, which happened to have the
exact same GEM handle and PPGTT virtual address, but completely
different shader contents.  This was causing the simulator to believe
that the vertex pipeline was executing a fragment shader, which didn't
end up well.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-11 12:27:22 -08:00
Jason Ekstrand fd77606b5b intel/fs: Use enumerated array assignments in fb read TXF setup
It's more clear and means we don't have to update the array every time
we add an optional texture instruction argument

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-11 10:57:09 -06:00
Caio Marcelo de Oliveira Filho ee670d09af intel/compiler: use 0 as sampler in emit_mcs_fetch
The sampler will be ignored since the underlying 'ld_mcs' operation
won't use it, so just fill the field with 0 instead of the texture to
make it clearer that's the case.

This will also avoid is_high_sampler() to kick in unnecessarily, in
case we are using the operation for a texture with index >= 16.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 14:51:56 -08:00
Ian Romanick 28ef5bb74c intel/compiler: Silence warning about value that may be used uninitialized
For some reason, this warning only occurs for me in release builds.

In file included from src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:25:0:
src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c: In function ‘brw_nir_lower_mem_access_bit_sizes’:
src/compiler/nir/nir_builder.h:501:26: warning: ‘src_swiz[2]’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       alu_src.swizzle[i] = swiz[i];
       ~~~~~~~~~~~~~~~~~~~^~~~~~~~~
src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:225:16: note: ‘src_swiz[2]’ was declared here
       unsigned src_swiz[4];
                ^~~~~~~~

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-08 10:37:06 -08:00
Lionel Landwerlin a934a3d124 anv: assert that color attachment are valid
This reverts commit d76e777988.

Let's make this obvious that there is an application issue if it tries
to access an attachment that doesn't exist in the current pass.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d76e777988 ("anv: Handle VK_ATTACHMENT_UNUSED in colorAttachment")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 00:18:16 +00:00
Emil Velikov 8943eb8f03 anv: wire up the state_pool_padding test
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 927ba12b53 ("anv/tests: Adding test for the state_pool padding.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com><Paste>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-02-05 11:39:36 -08:00
Caio Marcelo de Oliveira Filho 8c7c543936 isl: assert that Gen8+ don't have bit6_swizzling
v2: Rewrite the condition to more clearly match the comment. (Jordan)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-04 20:44:41 -08:00
Caio Marcelo de Oliveira Filho 5299c9cbcc anv: skip bit6 swizzle detection in Gen8+
It is always false on Gen8+.  Also, move the variable definition near
its use.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-04 20:44:41 -08:00
Danylo Piliaiev 64d3b148fe anv: Fix VK_EXT_transform_feedback working with varyings packed in PSIZ
Transform feedback did not set correct SO_DECL.ComponentMask for
varyings packed in VARYING_SLOT_PSIZ:
 gl_Layer         - VARYING_SLOT_LAYER    in VARYING_SLOT_PSIZ.y
 gl_ViewportIndex - VARYING_SLOT_VIEWPORT in VARYING_SLOT_PSIZ.z
 gl_PointSize     - VARYING_SLOT_PSIZ     in VARYING_SLOT_PSIZ.w

Fixes: 36ee2fd61c "anv: Implement the basic form of VK_EXT_transform_feedback"

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-04 15:30:43 +00:00
Danylo Piliaiev d76e777988 anv: Handle VK_ATTACHMENT_UNUSED in colorAttachment
From the Vulkan 1.0.98 spec for vkCmdClearAttachments:

"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that
element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED,
or must be a valid color attachment."

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-04 14:49:50 +02:00
Jason Ekstrand 48ed2a7bb0 anv: Implement VK_EXT_buffer_device_address
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-01 17:09:42 -06:00
Jason Ekstrand e644ed468f intel/fs: Implement nir_intrinsic_global_atomic_*
eviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand a91f392073 intel/fs: Use SENDS for A64 writes on gen9+
eviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand 1c25bf4373 intel/fs: Implement load/store_global with A64 untyped messages
eviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand b4f0d062cd intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions.  If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.

Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand 79724a0756 intel/fs: Properly handle 64-bit types in LOAD_PAYLOAD
By just assigning dst.type to src[i].type, we ensure that the offset at
the end of the loop actually offsets it by the right number of
registers.  Otherwise, we'll get into a case where we copy with a Q type
and then offset with a D type and things get out of sync.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:10:57 -06:00
Jason Ekstrand f02914a991 intel/fs/cse: Split create_copy_instr into three cases
Previously, we tried to combine all cases where the instruction being
CSE'd writes to more than one MOV worth of registers into one case with
a bit of special casing for LOAD_PAYLOAD.  This commit splits things so
that LOAD_PAYLOAD is entirely it's own case.  This makes tweaking the
LOAD_PAYLOAD case simpler in the next commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:10:40 -06:00
Jason Ekstrand f409a08e5f intel/nir: Add global support to lower_mem_access_bit_sizes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:08:29 -06:00
Oscar Blumberg fea5b8e5ad intel/fs: Fix memory corruption when compiling a CS
Missing check for shader stage in the fs_visitor would corrupt the
cs_prog_data.push information and trigger crashes / corruption later
when uploading the CS state.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 10:53:33 -08:00
Dylan Baker 4052142de7 meson: remove -std=c++11 from intel/tools
for meson all C++ code is already compiled as C++11, so it's
unnecessary. It's also the wrong way to do this, if we really needed
this the correct way is to set:

```meson
executable(
  ...
  override_options : ['cpp_std=c++11'],
)
```

Which ensures not only that the correct syntax for the current
compiler is used, but also that meson doesn't create arguments like
`-std=c++14 ... -std=c++11`

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-01-31 21:42:16 +00:00
Dylan Baker 8e49b32f63 meson: fix style in intel/tools
The `:` in options should always have one space before and after `foo
: bar`, and lists do not get spaces around the braces: `[foo]` not `[
foo ]`

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-01-31 21:42:16 +00:00
Dylan Baker d93d53fa72 meson: remove build_by_default : true
Which is and has always been the default. This is largely an artifact
of how the building of these tools was controlled when the meson build
was originally created.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-01-31 21:42:16 +00:00
Jason Ekstrand a920979d4f intel/fs: Use split sends for surface writes on gen9+
Surface reads don't need them because they just have the one address
payload.  With surface writes, on the other hand, we can put the address
and the data in the different halves and avoid building the payload all
together.

The decrease in register pressure and added freedom in register
allocation resulting from this change reduces spilling enough to improve
the performance of one customer benchmark by about 2x.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 014edff0d2 intel/fs: Add interference between SENDS sources
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand eab1c55590 intel/fs: Support SENDS in SHADER_OPCODE_SEND
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand cca199fd85 intel/disasm: Properly disassemble split sends
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 8babaa84e8 intel/eu: Add support for the SENDS[C] messages
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand d6a6e10390 intel/inst: Indent some code
We're about to add some more if cases so let's have the giant re-indent
in it's own patch to make review easier.

Acked-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand d96969120d intel/inst: Fix the ia16_addr_imm helpers
These have clearly never seen any use.... On gen8, the bottom 4 bits are
missing so we need to shift them off before we call set_bits and shift
again when we get the bits.  Found by inspection.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand e46fb33143 intel/disasm: Rework SEND decoding to use descriptors
Instead of fetching the information out of the instruction directly,
fetch the descriptor and then pluck the information out of the
descriptor.  The current scheme works ok for SEND but with SENDS, it all
falls to pieces because the descriptor is completely shuffled around.

This commit doesn't actually convert everything.  One notable exception
is URB messages which don't even use descriptors in emit_urb_WRITE yet.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 13a6fabc62 intel/eu: Add more message descriptor helpers
We want to be able to extract data from descriptors as well as unify a
bit of the descriptor construction.

One of the unifications we do is to unify the read/write and dataport
descriptors.  On gen4-5, read/write are substantially different and the
read descriptors change between gen4 and gen4.x.  On gen6, they unified
layouts between read, write, and dataport.  Then, on gen8, they added
one bit to the message type field but left it reserved MBZ for
read/write messages.  This commit chooses to treat that as if they
expanded the field everywhere and just didn't have enough enum values
for read/write to bother with the extra bit.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand c3aa436bfe intel/eu/validate: SEND restrictions also apply to SENDC
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand fee6bd8d8e intel/eu: Use GET_BITS in brw_inst_set_send_ex_desc
It's a bit more readable

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand b284d222db intel/fs: Use SHADER_OPCODE_SEND for varying UBO pulls on gen7+
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 8514eba693 intel/fs: Use SHADER_OPCODE_SEND for texturing on gen7+
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand f547cebbe0 intel/fs: Use a logical opcode for IMAGE_SIZE
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand d2d3e04501 intel/fs: Use SHADER_OPCODE_SEND for surface messages
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 7f1cf046cd intel/fs: Add a generic SEND opcode
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand ba3c5300f9 intel/eu: Rework surface descriptor helpers
This commit pulls the surface descriptor helpers out into brw_eu.h and
makes them no longer depend on the codegen infrastructure.  This should
allow us to use them directly from the IR code instead of the generator.
This change is unfortunately less mechanical than perhaps one would like
but it should be fairly straightforward.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 5b17379631 intel/eu: Add has_simd4x2 bools to surface_write functions
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 2ce93b88c0 intel/fs: Take an explicit exec size in brw_surface_payload_size()
Instead of magically falling back to SIMD8 for atomics and typed
messages on Ivy Bridge, explicitly figure out the exec size and pass
that into brw_surface_payload_size.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand cf42b0f9e2 intel/fs: Handle IMAGE_SIZE in size_read() and is_send_from_grf()
Like all the other sends, it's just mlen * REG_SIZE.

Fixes: 3cbc02e469 "intel: Use TXS for image_size when we have..."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 009c0bd840 intel/defines: Explicitly cast to uint32_t in SET_FIELD and SET_BITS
If you pass a bool in as the value to set, the C standard says that it
gets converted to an int prior to shifting.  If you try to set a bool to
bit 31, this lands you in undefined behavior.  It's better just to add
the explicit cast and let the compiler delete it for us.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand 077b9557a4 intel/fs: Get rid of fs_inst::equals
There are piles of fields that it doesn't check so using it is a lie.
The only reason why it's not causing problem is because it has exactly
one user which only uses it for MOV instructions (which aren't very
interesting) and only on Sandy Bridge and earlier hardware.  Just get
rid of it and inline it in the one place that it's actually used.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Matt Turner 18b467c066 intel/compiler: Add a file-level description of brw_eu_validate.c
Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-01-26 10:33:22 -08:00
Tapani Pälli 540939ecee android: fix build issues with libmesa_anv_gen* libraries
We need this include path to find nir/nir_xfb_info.h.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-01-25 15:21:06 +02:00
Andrii Simiklit 4759bb2fcf intel/batch-decoder: fix a vb end address calculation
According to the loop implementation (in 'ctx_print_buffer' function),
which advances dword by dword over vertex buffer(vb),
the vb size should be aligned by 4 bytes too.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109449
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-01-25 15:12:30 +02:00