Commit Graph

86851 Commits

Author SHA1 Message Date
Roland Scheidegger 5ec3a7333f draw: finally optimize bool clip mask generation
lp_build_any_true_range is just what we need, though it will only produce
optimal code with sse41 (ptest + set) - but even without it on 64bit x86
the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's
going to be roughly the same as before.
While here also make it a "real" 8bit boolean - cuts one instruction but
more importantly similar to ordinary booleans.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-18 01:25:21 +01:00
Roland Scheidegger b16f06fd05 draw: use vectorized calculations for fetch (v2)
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar.
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).

Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to llvm not recognizing it's all
the same fetch, since it would have been possible some of the fetches
getting replaced with zeros in case vector size exceeds remaining fetch
count - the values of such fetches don't matter at all though).

Also, for elts gathering, use vectorized code as well.

The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).

v3: skip the fake index buffer, not needed due to the jit code never seeing
the real index buffer in the first place.
Fix a bug with mask expansion (needs SExt, not ZExt).
Also, be really really careful to keep the behavior the same, even in cases
where it looks wrong, and add comments why the code is doing the seemingly
wrong stuff... Fortunately it's not actually more complex in the end...
Also change function order slightly just to make the diff more readable.

No piglit change. Passes some internal testing with another api too...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-18 01:25:21 +01:00
Jordan Justen 0cee3fd5c7 i965/gen7: Minify blit size for stencil tree copy
Found by the piglit 'fbo-depth-array stencil-clear' test when
implementing blorp blit splitting for gen7.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-17 14:15:44 -08:00
Kenneth Graunke 9bfee7047b mesa: Drop PATH_MAX usage.
GNU/Hurd does not define PATH_MAX since it doesn't have such arbitrary
limitation, so this failed to compile.  Apparently glibc does not
enforce PATH_MAX restrictions anyway, so it's kind of a hoax:

https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html

MSVC uses a different name (_MAX_PATH) as well, which is annoying.

We don't really need it.  We can simply asprintf() the filenames.
If the filename exceeds an OS path limit, presumably fopen() will
fail, and we already check that.  (We actually use ralloc_asprintf
because Mesa provides that everywhere, and it doesn't look like we've
provided an implementation of GNU's asprintf() for all platforms.)

Fixes the build on GNU/Hurd.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98632
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 14:14:37 -08:00
Kenneth Graunke ca76e6b521 i965: Fix compute shader crash.
Fixes crashes when starting Deus Ex: Mankind Divided.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-11-17 14:14:06 -08:00
Jason Ekstrand 3da7adc755 anv/TODO: Check off render buffer compression
There's still a tiny bit of work to do for storage images but it's
otherwise pretty much done at this point.
2016-11-17 12:03:24 -08:00
Jason Ekstrand 4e91f158e6 anv: Enable "permanent" compression for immutable format images
This commit extends our support of color compression to surfaces without
the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set.  These images will never have
an image view created with a different format then the one set at image
creation time so it's safe to always use compression.  We still bail if the
image is used as a storage image because that sometimes ends up using a
different format.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand 2b5644e94d intel/blorp: Properly handle color compression in blorp_copy
Previously, blorp copy operations were CCS-unaware so you had to perform
resolves on the source and destination before performing the copy.  This
commit makes blorp_copy capable of handling CCS-compressed images without
any resolves.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand 89f9c46a74 intel/blorp: Always use UINT formats on SKL+
Many of these UINT formats aren't available prior to Sky Lake so we used
UNORM formats.  Using UINT formats is a bit nicer because it guarantees we
don't run into rounding issues.  Also, we will need it in the next commit
for handling copies with CCS enabled.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand c8357b5d34 i965/blorp: Rework resolve handling
This commit moves the handling of resolves into blorp_surf_for_miptree().
Instead of each helper doing resolves and checks itself, it simply tells
blorp_surf_for_miptree which aux modes are supported by the given blorp
operation and blorp_surf_for_miptree will resolve as-needed.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand edb7f67bd9 anv/image: Add an aux_usage field for "default" aux
Initially, the field is set to ISL_AUX_USAGE_NONE so this commit shouldn't
bring any functional changes.  Setting this field to something else will
cause all sampled and storage image views to be created with AUX and blorp
will start trying to respect it so set with care.
2016-11-17 12:03:24 -08:00
Jason Ekstrand 338cdc172a anv: Add initial support for Sky Lake color compression
This commit adds basic support for color compression.  For the moment,
color compression is only enabled within a render pass and a full resolve
is done before the render pass finishes.  All texturing operations still
happen with CCS disabled.
2016-11-17 12:03:24 -08:00
Jason Ekstrand e2f5880839 anv/pass: Precompute some subpass usage information 2016-11-17 12:03:24 -08:00
Jason Ekstrand 9b9fb6d212 util/vk_alloc: Add a vk_zalloc2 helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand a512565b2b anv/image: Memset all aux surfaces (not just HiZ) to 0
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand c3eb58664e anv/image: Rename hiz_surface to aux_surface 2016-11-17 12:03:24 -08:00
Jason Ekstrand ccdf9af392 anv/blorp: Ignore clears for attachments first used as resolve destinations
Otherwise, we'll try to clear it the first time it's used as a draw so if
you do some multisampled rendering, resolve to an attachment, and then draw
on top of the single-sampled attachment, we might accidentally clear it.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-17 12:03:24 -08:00
Jason Ekstrand 1ba2f05bc0 intel/blorp: Take a fast_clear_op in ccs_resolve
Eventually, we may want to just have a single blorp_ccs_op function that
does both clears and resolves.  For now we'll stick to just making the
ccs_resolve function we have now a bit more configurable.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Pohjolainen, Topi 7c560e8ccc intel/blorp: Add plumbing for color resolve slice details
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-17 12:03:24 -08:00
Jason Ekstrand d7bd8c15d6 intel/isl: Allow non-2D CCS surfaces
The CCS calculations in ISL are already correct for 1-D and 3-D CCS
surfaces since they have exactly the same layout as 2-D array surfaces (at
least on Sky Lake).  The only problem was that we weren't passing in the
right dimensionality and we weren't passing in the depth.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand 26c8bb7bc0 intel/isl: Rework the asserts and fails in isl_surf_get_ccs
There are some invariants such as number of samples on which we should
assert.  However, most other things should silently return false since
they're much easier for isl_surf_get_ccs to check than the caller.  We also
update the checking to be a bit more complete.
2016-11-17 12:03:24 -08:00
Jason Ekstrand 818c7bfb31 anv/cmd_buffer: Refactor surface state relocation handling
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand 9be9f5f1c7 anv/cmd_buffer: Pull add_surface_state_reloc into genX_cmd_buffer.c
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand 0c403df310 anv/image: Stop force-disabling AUX
Auxiliary surfaces have to be created manually anyway so force-disabling it
does nothing whatsoever at the moment.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Tom Stellard 929fcee47e mesa: Add missing call to _mesa_unlock_debug_state(ctx); v2
cd724208d3 added a call to
_mesa_lock_debug_state(ctx) but wasn't unlocking the debug state.

This fixes a hang in glsl-fs-loop piglit test with MESA_DEBUG=context.

v2:
  - Remove unrelated changes.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-17 18:32:35 +00:00
Eric Engestrom 9702f91366 egl: fix helper function name
I introduced this code last month, but didn't follow the naming
convention. Fix this.

Fixes: 0a606a400f ("egl: add eglSwapBuffersWithDamageKHR")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
2016-11-17 09:33:25 +02:00
Eric Engestrom 8b780a543a egl/x11: misc style fixes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-17 09:32:48 +02:00
Eric Engestrom 41b5d98b28 egl: fix function name in debug string
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-17 09:32:11 +02:00
Jason Ekstrand 9557147592 nir/spirv: Fix handling of gl_PrimitiveId
Before, we were always treating it as an output which bogus.  The only
stage in which this it can be an output is the geometry stage.  In all
other stages, it's an input which, in the back-end, we actually want to be
a system value.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-16 20:07:23 -08:00
Jason Ekstrand 1c97432ce8 anv/fence: Handle ANV_FENCE_CREATE_SIGNALED_BIT
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-16 20:07:23 -08:00
Jason Ekstrand 49f08ad77f anv: Handle null in all destructors
This fixes a bunch of new CTS tests which look for exactly this.  Even in
the cases where we just call vk_free to free a CPU data structure, we still
handle NULL explicitly.  This way we're less likely to forget to handle
NULL later should we actually do something less trivial.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-16 20:07:23 -08:00
Jason Ekstrand d0646c8015 util/vk_alloc: Ensure NULL is handled correctly in vk_free
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-16 20:07:23 -08:00
Jason Ekstrand 18266247a0 anv/device: Silence a 32-bit warning 2016-11-16 20:07:20 -08:00
Eric Anholt 80786a67cf nir: Avoid an extra NIR op in integer divide lowering.
NIR bools are ~0 for true, so ((unsigned)a >> 31) != 0 -> ((int)a >> 31).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-16 19:45:01 -08:00
Eric Anholt 7f27ad5597 vc4: Try compiling our FSes in multithreaded mode on new kernels.
Multithreaded fragment shaders let us hide texturing latency by a
hyperthreading-style switch to another fragment shader.  This gets us up
to 20% framerate improvements on glmark2 tests.
2016-11-16 19:45:01 -08:00
Eric Anholt 45c022f2b0 vc4: Add support for ETC1 textures if the kernel is new enough.
The kernel changes for exposing the param have now been merged, so we can
expose it here.
2016-11-16 19:45:01 -08:00
Eric Anholt 7130260d12 vc4: Fix simulator mode missing-GETPARAM debug info.
The value is 0 since we didn't set it, we wanted to see the param.
2016-11-16 19:45:01 -08:00
Mun Gwan-gyeong 20c1623a11 vc4: Fix resource leak in register allocation failure path.
CID 1394322

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
2016-11-16 19:45:01 -08:00
Timothy Arceri 686dad657f glsl: stub out _mesa_reference_program() in standalone compiler
The follow patch will call this directly from the linker, the
shader cache will also start calling these from the compiler.
2016-11-17 12:53:12 +11:00
Timothy Arceri c3df65c123 st/mesa/r200/i915/i965: move ARB program fields into a union
It's common for games to compile 2000 programs or more so at

32bits x 2000 programs x 22 fields x 2 (at least) stages

This should give us something like 352 kilobytes in savings
once we add some more glsl only fields.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:53:12 +11:00
Timothy Arceri d6bdb3a862 st/mesa: stop initialing Instructions and NumInstructions
Since gl_program is now created with rzalloc() they should
already be initialised.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:59 +11:00
Timothy Arceri 0ad69e6b51 mesa: make use of ralloc when creating ARB asm gl_program fields
This will allow us to move the ARB asm fields in gl_program into
a union as we will be able call ralloc_free() on the entire struct
when destroying the context.

In this change we switch over to using ralloc for the Instructions,
String and LocalParams fields of gl_program.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri 9c9589f1e2 mesa: remove unused Comment field in prog_instruction
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri 67b9c26342 i965: get num_abos from shader_info rather than gl_linked_shader
This is a step towards freeing gl_linked_shader after linking.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri 5581f2a8f2 mesa/glsl: copy num_abos to gl_program
We should be able to free gl_linked_shader after linking in order to
do so we need to switch to getting values from gl_program instead.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri ba40c8b03c i965: get num_images from shader_info rather than gl_linked_shader
This is a step towards freeing gl_linked_shader after linking.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri 9c2042f2ce mesa/glsl: copy num_images to gl_program
We should be able to free gl_linked_shader after linking in order to
do so we need to switch to getting values from gl_program instead.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri 6b82e957be nir: add support for counting AoA uniforms in nir_shader_gather_info()
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri c3b8bf9bc9 i965: only try print GLSL IR once when using INTEL_DEBUG to dump ir
Since we started releasing GLSL IR after linking the only time we can
print GLSL IR is during linking. When regenerating variants only NIR
will be available.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Jason Ekstrand 2e2160969e anv/descriptor_set: Put the whole state in the state free list
We're not really saving much by just putting the offset in there.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-16 17:07:35 -08:00