Commit Graph

85350 Commits

Author SHA1 Message Date
Dave Airlie 7bf76563e2 glsl: add subpass image type (v2)
SPIR-V/Vulkan have a special image type for input attachments
called the subpass type. It has different characteristics than
other images types.

The main one being it can only be an input image to fragment
shaders and loads from it are relative to the frag coord.

This adds support for it to the GLSL types. Unfortunately
we've run out of space in the sampler dim in types, so we
need to use another bit.

v2: Fixup subpass input name (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-16 15:16:31 +10:00
Kenneth Graunke 081f21f29b isl: Finish tiling filtering for Gen6.
Gen6 only has one additional restriction over Gen7+, so we just add it
to the existing gen7 function (which actually covers later gens too).

This should stop FINISHME spew when running GL on Sandybridge.

v2: Fix bytes per block vs. bits per block confusion (Jason) and
    rename function to gen6_filter_tiling (Jason and Chad).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-15 21:21:50 -07:00
Ilia Mirkin 9fec15a7e0 i965: enable ARB_ES3_2_compatibility on gen8+
Note that ASTC support is not actually mandated for this extension to be
exposed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 19:29:41 -04:00
Jason Ekstrand 111f6b250d i965/nir: Roll set_default_interpolation into lower_fs_inputs
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:43 -07:00
Jason Ekstrand 246db0063e i965/fs: Use NIR for handling forced per-sample interpolation
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:43 -07:00
Jason Ekstrand ed65e6ef49 nir: Add a flag to lower_io to force "sample" interpolation
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:43 -07:00
Jason Ekstrand 114874b22b i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode
From the ARB_gpu_shader5 spec:

   The built-in functions interpolateAtCentroid() and interpolateAtSample()
   will sample variables as though they were declared with the "centroid"
   or "sample" qualifiers, respectively.

When running with persample dispatch forced by the API, we interpolate
anything that isn't flat as if it's qualified by "sample".  In order to
keep interpolateAtCentroid() consistent with the "centroid" qualifier, we
need to make interpolateAtCentroid() do sample interpolation instead.
Nothing in the GLSL spec guarantees that the result of
interpolateAtCentroid is uniform across samples in any way, so this is a
perfectly fine thing to do.

Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS
tests that specifically validate consistency between the "sample" qualifier
and interpolateAtSample()

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:27 -07:00
Brian Paul 0d2eb8c14d mesa: check for no matrix change in _mesa_LoadMatrixf()
Some apps issue redundant glLoadMatrixf() calls with the same matrix.
Try to avoid setting dirty state in that situation.

This reduces the number of constant buffer updates by about half in
ET Quake Wars.

Tested with Piglit, ETQW, Sauerbraten, Google Earth, etc.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 12:00:12 -06:00
Jon Turney 533b3530c1 direct-to-native-GL for GLX clients on Cygwin ("Windows-DRI")
Structurally, this is very similar to the existing Apple-DRI code, except I
have chosen to implement this using the __GLXDRIdisplay, etc. vtables (as
suggested originally in [1]), rather than a maze of ifdefs.  This also means
that LIBGL_ALWAYS_SOFTWARE and LIBGL_ALWAYS_INDIRECT work as expected.

[1] https://lists.freedesktop.org/archives/mesa-dev/2010-May/000756.html

This adds:

* the Windows-DRI extension protocol headers and the windowsdriproto.pc
file, for use in building the Windows-DRI extension for the X server

* a Windows-DRI extension helper client library

* a Windows-specific DRI implementation for GLX clients

The server is queried for Windows-DRI extension support on the screen before
using it (to detect the case where WGL is disabled or can't be activated).

The server is queried for fbconfigID to pixelformatindex mapping, which is
used to augment glx_config.

The server is queried for a native handle for the drawable (which is of a
different type for windows, pixmaps and pbuffers), which is used to augment
__GLXDRIdrawable.

Various GLX extensions are enabled depending on if the equivalent WGL
extension is available.
2016-09-15 13:14:43 +01:00
Emil Velikov 2ac09ac5a5 docs: add news item and link release notes for 12.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-15 11:31:06 +01:00
Emil Velikov 219a2f5f9f docs: add sha256 checksums for 12.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 09460b8cf7ddac4abb46eb6439314b29954c76a6)
2016-09-15 11:30:00 +01:00
Emil Velikov 06f83a5548 docs: add release notes for 12.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit d79b2e7bf30ad6d1fa43f30940a64ed9fd0aa9c0)
2016-09-15 11:29:59 +01:00
Kenneth Graunke 3bcdc2e3db mesa: Expose RESET_NOTIFICATION_STRATEGY with KHR_robustness.
This is supposed to be exposed with the GL_KHR_robustness extension,
which we support on ES 2.0 and later.  On desktop GL, it's also exposed
by GL_ARB_robustness, which is supported by all drivers ("dummy_true").
so we also allow desktop GL.

Fixes:
- ES32-CTS.robust.robustness.noResetNotification
- ES32-CTS.robust.robustness.loseContextOnReset

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-15 00:58:47 -07:00
Jason Ekstrand 89a96c8f43 anv/cmd_buffer: Set the L3 atomic disable mask bit in CHICKEN3 on HSW
Without this bit set, the value in "L3 Atomic Disable" won't get applied by
the hardware so we won't properly get L3 atomic caching.

Fixes dEQP-VK.spirv_assembly.instruction.compute.opatomic.compex and 198 of
the dEQP-VK.image.atomic_operations.* tests on HSW

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-09-14 17:53:16 -07:00
Jason Ekstrand a814e18c96 intel/blorp: Stop setting 3DSTATE_DRAWING_RECTANGLE
The Vulkan driver sets 3DSTATE_DRAWING_RECTANGLE once to MAX_INT x MAX_INT
at the GPU initialization time and never sets it again.  The GL driver sets
it every time the framebuffer changes.  Originally, blorp set it to the
size of the drawing area but meant we had to set it back in the Vulkan
driver.  Instead, we can easily just do that in the GL driver's blorp_exec
implementation and not set it in blorp core.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-14 17:51:16 -07:00
Jason Ekstrand b56f509ee0 intel/blorp: Emit 3DSTATE_MULTISAMPLE directly
Previously, we relied on a driver hook for 3DSTATE_MULTISAMPLE.  However,
now that Vulkan and GL use the same sample positions, we can set up
3DSTATE_MULTISAMPLE directly in blorp and delete the driver hook.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-14 17:51:16 -07:00
Jason Ekstrand c779ad3e66 intel: Move Vulkan sample positions to common code
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-14 17:51:16 -07:00
Marek Olšák f019255acf Revert "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions"
This reverts commit 524fd55d2d.

Reason: https://bugs.freedesktop.org/show_bug.cgi?id=97808
2016-09-15 00:47:24 +02:00
Francisco Jerez 6d861968ca i965/vec4: Assert that pull constant load offsets are 16B-aligned.
Non-16B-aligned pull constant loads are unlikely to be particularly
useful given that you can get roughly the same effect by using
swizzles on the result.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:59 -07:00
Francisco Jerez 5ca35c6367 i965/vec4: Assert that ATTR regions are register-aligned.
It might be useful to actually handle this once copy propagation
becomes smarter about register-misaligned offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:59 -07:00
Francisco Jerez f33a8f8fcf i965/vec4: Don't spill non-GRF-aligned register regions.
A better fix would be to do something along the lines of the FS
back-end spilling code and emit a scratch read before any instruction
that overwrites the register to spill partially due to a non-zero
sub-register offset.  In the meantime mark registers used with a
non-zero sub-register offset as no-spill to prevent the spilling code
from miscompiling the program.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:59 -07:00
Francisco Jerez 8531f943d9 i965/vec4: Fix copy propagation for non-register-aligned regions.
This prevents it from trying to propagate a copy through a
register-misaligned region.  MOV instructions with a misaligned
destination shouldn't be treated as a direct GRF copy, because they
only define the destination GRFs partially.  Also fix the interference
check implemented with is_channel_updated() to consider overlapping
regions with different register offset to interfere, since the
writemask check implemented in the function is only valid under the
assumption that the source and destination regions are aligned
component by component.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:59 -07:00
Francisco Jerez 0e657b7b55 i965/vec4: Compare full register offsets in cmod propagation.
Cmod propagation would misoptimize the program if the destination
offset of the generating instruction wasn't exactly the same as the
source region offset of the copy instruction.  In preparation for
adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez 8bed1adfc1 i965/vec4: Assign correct destination offset to rewritten instruction in register coalesce.
Because the pass already checks that the destination offset of each
'scan_inst' that needs to be rewritten matches 'inst->src[0].offset'
exactly, the final offset of the rewritten instruction is just the
original destination offset of the copy.  This is in preparation for
adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez 3a74e437fd i965/vec4: Don't coalesce registers with overlapping writes not matching the MOV source.
In preparation for adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez 1bb5074474 i965/vec4: Compare full register offsets in opt_register_coalesce nop move check.
In preparation for adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez 3be0d6d040 i965/vec4: Check that the write offsets match when setting dependency controls.
For simplicity just assume that two writes to the same GRF with
different sub-GRF offsets will potentially interfere and break the
dependency control chain.  This is in preparation for adding sub-GRF
offset support to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez b52fefc4d5 i965/vec4: Change opt_vector_float to keep track of the last offset seen in bytes.
This simplifies things slightly and makes the pass more correct in
presence of sub-GRF offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez 230615e228 i965/vec4: Simplify src/dst_reg to brw_reg conversion by using byte_offset().
This should also have the side effect of fixing convert_to_hw_regs()
to handle sub-GRF register offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez eb746a80e5 i965/ir: Update several stale comments.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez 47784e2346 i965/ir: Don't print ARF subnr values twice.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez 5d65d51e78 i965/vec4: Print src/dst_reg::offset field consistently for all register files.
C.f. 'i965/fs: Print fs_reg::offset field consistently for all
register files.'.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez ec259f5307 i965/fs: Print fs_reg::offset field consistently for all register files.
The offset printing code in fs_visitor::dump_instruction() was doing
things differently for sources and destinations and for each register
file -- In some cases it would be added to the base register number
fs_reg::nr, in other cases it would follow the base register separated
with a plus sign, in other cases (uniforms) it would do both (!).  The
sub-register offset was also being printed or not rather
inconsistently.  Fix it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez 950af5ed40 i965/fs: Misc simplification.
Get rid of some leftover redundant arithmetic introduced during the
conversion to byte offsets and sizes that can be simplified easily.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez 80e1d670b4 i965/fs: Get rid of fs_inst::set_smear().
component() was generally a better alternative because of several
issues set_smear() had:

 - It wouldn't take the original stride and offset of the register
   into account, which means that set_smear() on the result of
   e.g. another set_smear() call or an offset() call would give a
   bogus region as result.

 - It was an inherently destructive operation.  See the
   'nir_intrinsic_shader_clock' hunk below for how this could lead to
   subtle bugs in cases where set_smear() was called multiple times on
   the same register like 'r.set_smear(0), r.set_smear(1)' with the
   expectation that each call would return a separate value instead of
   a reference to the same subsequently mutated object.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez 8e58e4412f i965/fs: Use region_contained_in() in compute-to-mrf coalescing pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez f2d2156ba2 i965/fs: Move region_contained_in to the IR header and fix for non-VGRF files.
Also changed the argument names since 'src' and 'dst' don't make that
much sense outside of the context of copy propagation.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez 645261c4b2 i965/fs: Change region_contained_in() to use byte units.
This makes the function less annoying to use and more accurate -- We
shouldn't propagate a copy into a register region that wasn't fully
contained in the destination of the copy (IOW, a source region that
wasn't fully defined by the copy) just because the number of registers
written and read by each instruction happened to get rounded up to the
same GRF multiple.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez 1c67e27247 i965/fs: Simplify copy propagation LOAD_PAYLOAD ACP setup.
By keeping track of 'offset' in byte units.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez 2d7d4a7910 i965/fs: Simplify a bunch of fs_inst::size_written calculations by using component_size().
Using component_size() is easier and generally more correct because it
takes into account the register type and stride for you.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez 0bc46cc961 i965/fs: Simplify result_live calculation in dead_code_eliminate().
No need to unroll the first iteration of the loop manually.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez 62aaef6c83 i965/fs: Simplify and fix buggy stride/offset calculations using subscript().
These were bashing the 'offset' and 'stride' values of several
registers without taking the previous value into account, which
probably didn't matter in practice for optimize_frontfacing_ternary()
because the 'tmp' register already had a known region, but it would
have given the wrong region as result in the other cases in
lower_integer_multiplication().  subscript(..., i) is a more
straightforward way to take the i-th field of a given type from each
channel of a register which should give the right answer as result
regardless of the original 'offset' and 'stride' parameters of the
register region.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez 3b7b908787 i965/fs: Simplify get_fpu_lowered_simd_width() by using inequalities instead of rounding.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez ee930c0435 i965/fs: Simplify byte_offset().
In the most common case this can now be implemented as a simple
addition because the offset is already encoded as a single scalar
value in bytes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez bae3a41171 i965/fs: Fix signedness of the return value of fs_inst::size_read().
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez a384503c15 i965/fs: Switch mask_relative_to() used in compute-to-mrf to byte units.
This makes the helper function less annoying to use and somewhat more
accurate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez 401fc228fd i965/fs: Fix bogus sub-MRF offset calculation in compute-to-mrf.
The 'scan_inst->dst.offset % REG_SIZE' term in the final
'scan_inst->dst.offset' calculation is obviously bogus.  The offset
from the start of the copy destination register 'inst->dst' where the
destination of the generating instruction 'scan_inst' would be written
to (before compute-to-mrf runs) is just the offset of 'scan_inst->dst'
relative to the source of the copy instruction (AKA rel_offset in the
code below).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez cd0134072a i965/fs: Take into account copy register offset during compute-to-mrf.
This was dropping 'inst->dst.offset' on the floor.  Nothing in the
code above seems to guarantee that it's zero and in that case the
offset of the register being coalesced into wouldn't be taken into
account while rewriting the generating instruction.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez fcd9d1badc i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().
This makes sure that overlap checks are done correctly throughout the
back-end when the '*this' register starts before the register/size
pair provided as argument, and is actually less annoying to use than
in_range() at this point since regions_overlap() takes its size
arguments in bytes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez 56bcb2230f i965/vec4: Port regions_overlap() to the vec4 IR.
This is copy-pasted almost line by line from the FS back-end.  The
only reason it cannot be implemented in terms of backend_reg is that
the backend_reg::nr field doesn't have the same meaning for uniforms
on both back-ends.  It could be easily deduplicated by using a
template function.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00