Commit Graph

98046 Commits

Author SHA1 Message Date
Juan A. Suarez Romero ce221cbbcf mesa/teximage: add TEXTURE_CUBE_MAP_ARRAY target for CompressedTexImage3D
From section 8.7, page 179 of OpenGL ES 3.2 spec:

  An INVALID_OPERATION error is generated by CompressedTexImage3D
  if internalformat is one of the the formats in table 8.17 and target
  is not TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY or TEXTURE_3D.

  An INVALID_OPERATION error is generated by CompressedTexImage3D if
  internalformat is TEXTURE_CUBE_MAP_ARRAY and the “Cube Map Array”
  column of table 8.17 is not checked, or if internalformat is
  TEXTURE_3D and the “3D Tex.” column of table 8.17 is not checked.

So far it was only considering TEXTURE_2D_ARRAY as valid target. But as
"Cube Map Array" column is checked for all the cases, in practice we can
consider also TEXTURE_CUBE_MAP_ARRAY.

This fixes KHR-GLES32.core.texture_cube_map_array.etc2_texture

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-21 13:05:42 +01:00
Tapani Pälli 6236ffeb83 intel: fix disasm_info memory leaks
Fixes: 4f82b17287 ("i965: Rewrite disassembly annotation code")
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-11-21 08:36:43 +02:00
Timothy Arceri 04a9558497 st/glsl_to_nir: don't generate nir twice for gs
This was left out of c980a3aa31

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-21 15:57:39 +11:00
Roland Scheidegger b5957cee92 llvmpipe: fix snorm blending
The blend math gets a bit funky due to inverse blend factors being
in range [0,2] rather than [-1,1], our normalized math can't really
cover this.
src_alpha_saturate blend factor has a similar problem too.
(Note that piglit fbo-blending-formats test is mostly useless for
anything but unorm formats, since not just all src/dst values are
between [0,1], but the tests are crafted in a way that the results
are between [0,1] too.)

v2: some formatting fixes, and fix a fairly obscure (to debug)
issue with alpha-only formats (not related to snorm at all), where
blend optimization would think it could simplify the blend equation
if the blend factors were complementary, however was using the
completely unrelated rgb blend factors instead of the alpha ones...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-11-21 04:06:29 +01:00
Dave Airlie 464c2d8083 r600: add cull distance support
This passes all the tests in piglit.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-21 09:00:52 +10:00
Aravindan Muthukumar 971b3c019b i965: Optimize bucket index calculation
Reducing Bucket index calculation to O(1).

This algorithm calculates the index using matrix method.  Assuming
PAGE_SIZE is 4096, matrix arrangement is as below:

          1*4096   2*4096    3*4096    4*4096
          5*4096   6*4096    7*4096    8*4096
          10*4096  12*4096   14*4096   16*4096
          20*4096  24*4096   28*4096   32*4096
           ...      ...       ...       ...
           ...      ...       ...       ...
           ...      ...       ...   max_cache_size

From this matrix its clearly seen that every row follows the below way:

          ...       ...       ...        n
        n+(1/4)n  n+(1/2)n  n+(3/4)n    2n

Row is calculated as log2(size/PAGE_SIZE) Column is calculated as
converting the difference between the elements to fit into power size of
two and indexing it.

Final Index is (row*4)+(col-1)

Tested with Intel Mesa CI.

Improves performance of 3DMark on BXT by 0.705966% +/- 0.229767% (n=20)

v4: Review comments on style and code comments implemented (Ian).
v3: Review comments implemented (Ian).
v2: Review comments implemented (Jason).

Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Kedar Karanje <kedar.j.karanje@intel.com>
Reviewed-by: Yogesh Marathe <yogesh.marathe@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2017-11-20 14:52:42 -08:00
Dylan Baker c8417c8d25 meson: Guard the gallium dri componenet
Currently the target has a redundant guard, and the state tracker isn't
properly guarded.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-11-20 14:28:31 -08:00
Dylan Baker 689fb74716 meson: don't build gallium subdir unless we're building gallium
This will allow us to simplify some guards within the gallium directory.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-11-20 14:28:31 -08:00
Eric Anholt 494effd242 broadcom/vc5: Align 1D texture miplevels to 64b.
Fixes tex-miplevel-selection GL2:texture() 1D
2017-11-20 13:54:45 -08:00
Eric Anholt 9d5972da80 broadcom/vc5: Clamp min lod to the last level.
Otherwise, the simulator would complain in tex-miplevel-selection that the
min/max clamp was out of order.  The actual HW seems to have clamped to
the max anyway.
2017-11-20 13:52:33 -08:00
Eric Anholt 2c8913e224 broadcom/vc5: Increase simulator memory for tex-miplevel-selection.
We were overflowing, because of all the little 4k allocations for CLs that
were getting expanded to 128kb in the simulator due to the GMP alignment.
2017-11-20 13:52:33 -08:00
Tim Rowley 34838c2212 swr/rast: Repair simd8 frontend code rot
Keep non-default simd8 frontend code running for comparison purposes.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:51:10 -06:00
Tim Rowley 005d937e15 swr/rast: Implement AVX-512 GATHERPS in SIMD16 fetch shader
Disabled for now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:51:06 -06:00
Tim Rowley 2e244c7168 swr/rast: Simplify GATHER* jit builder api
General cleanup, and prep work for possibly moving to llvm masked
gather intrinsic.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:51:01 -06:00
Tim Rowley 44025def06 swr/rast: Add alignment to transpose targets
Needed to ensure alignment for avx512.

Fixes address sanitizer crash.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:56 -06:00
Tim Rowley bc356b0fc0 swr/rast: Cache eventmanager
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:51 -06:00
Tim Rowley 395a298fa5 swr/rast: Enable AVX-512 targets in the jitter
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:45 -06:00
Tim Rowley 37bb69fb88 swr/rast: Points with clipdistance can't go through simplepoints path
Fixes piglit glsl-1.20:vs-clip-vertex-primitives and
glsl-1.30:vs-clip-distance-primitives.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:38 -06:00
Tim Rowley d9de8f3122 swr/rast: Code style change (NFC)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:29 -06:00
Tim Rowley 08512c52de swr/rast: Widen fetch shader to SIMD16
Widen fetch shader to SIMD16, enable SIMD16 types in the jitter,
and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:50:23 -06:00
Tim Rowley e612231f20 swr/rast: Support flexible vertex layout for DS output
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20 13:49:59 -06:00
Nicolai Hähnle 3f17d3c017 gallium/u_threaded: avoid syncing in threaded_context_flush
We could always do the flush asynchronously, but if we're going to wait
for a fence anyway and the driver thread is currently idle, the additional
communication overhead isn't worth it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:15 +01:00
Nicolai Hähnle bc65dcab3b radeonsi: avoid syncing the driver thread in si_fence_finish
It is really only required when we need to flush for deferred fences.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:11 +01:00
Nicolai Hähnle 3db1ce01b1 radeonsi: recompute the relative timeout after waiting for ready fence
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:06 +01:00
Nicolai Hähnle f5ea8d18ff ddebug: fix the hang detection timeout calculation
Fixes: c9fefa062b ("ddebug: rewrite to always use a threaded approach")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:03 +01:00
Nicolai Hähnle 16f8da2997 ddebug: fix use-after-free of streamout targets
Fixes: b47727a83a ("ddebug: implement pipelined hang detection mode")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:16:00 +01:00
Nicolai Hähnle aaebf49eba gallium/u_threaded: properly initialize fence unflushed tokens
This got lost in a rebase but never hurt anything because we happened
to always sync in fence_finish anyway...

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:15:56 +01:00
Nicolai Hähnle 81aabb20f3 util/u_queue: really use futex-based fences
The relevant define changed in the final revision of the simple mutex
patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:15:53 +01:00
Nicolai Hähnle a6e8311723 util/u_queue: fix timeout handling in util_queue_fence_wait_timeout
Fixes: e3a8013de8 ("util/u_queue: add util_queue_fence_wait_timeout")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:15:49 +01:00
Nicolai Hähnle 764bd6ef96 st/mesa: use asynchronous flushes in st_finish
With threaded gallium, the driver may currently be running in another
thread. In that case, we will execute all remaining commands in that
thread instead of syncing, which should be better for cache locality.

Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:15:07 +01:00
Nicolai Hähnle 2d8b82baaa st/mesa: implement st_server_wait_sync properly
Asynchronous flushes require a proper implementation of
st_server_wait_sync, because we could have the following with
threaded Gallium:

 Context 1 app     Context 1 driver         Context 2
 -------------     ----------------         ---------
 f = glFenceSync
 glFlush
 <-- app sync -->                           <-- app sync -->
                                            glWaitSync(f)
                                            .. draw calls ..
                   pipe_context::flush
                     for glFenceSync
                   pipe_context::flush
                     for glFlush

Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:15:07 +01:00
Nicolai Hähnle ce470af0b1 u_threaded_gallium: remove synchronization in fence_server_sync
The whole point of fence_server_sync is that it can be used to
avoid waiting in the application thread.

Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 18:15:06 +01:00
Nicolai Hähnle abeded1cac amd: build addrlib with C++11
It is required for LLVM anyway.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103658
Fixes: 7f33e94e43 ("amd/addrlib: update to latest version")
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 16:26:28 +01:00
Nicolai Hähnle df5ebe0c26 radeonsi/gfx9: fix VM fault with fetched instance divisors
We need to account for SGPR locations in merged shaders.

This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations

Fixes: 79c2e7388c ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 16:26:10 +01:00
Samuel Pitoiset 3a32858fc3 radv: use a 16 bytes array for the sampled/storage image descriptors
This allows to update them with only one memcpy().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-20 11:18:22 +01:00
Samuel Pitoiset bc92ed04ac radv: do not add the query pool BO to the list in vkCmdEndQuery()
As per the spec, the query identified by queryPool and query
must currently be active. Applications have to call vkCmdBeginQuery()
before, and thus the query pool BO will already be in the list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-20 11:18:20 +01:00
Samuel Pitoiset cf54ea155e radv: only load needed depth clear regs for fast depth clears
Similar to how the driver sets the depth clear regs after a
fast depth clear. Most of the time, this will copy a 32-bit reg
instead of a 64-bit reg.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-20 10:45:27 +01:00
Samuel Pitoiset e55b7609fa radv: do not add the image BO in radv_set_depth_clear_regs()
For the fast path, radv_fill_buffer() ensures that the BO is
already in the list. For the slow path, the depth surface is
part of the framebuffer which means the BO is added to the list
when the framebuffer is emitted.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-20 10:45:23 +01:00
Samuel Pitoiset 3c6bba83f0 radv: remove useless assertion in emit_depthstencil_clear()
Already checked in emit_clear().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-20 10:45:21 +01:00
Samuel Pitoiset 403a3d8061 radv: remove useless check in radv_set_depth_clear_regs()
aspects can't be zero and there is an assertion that ensures
it's not in emit_clear().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-20 10:45:19 +01:00
Dave Airlie 59ca0c4b44 docs/features: mark some r600 extensions supported
These just looked to be missed when this file was updated.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-20 10:22:25 +10:00
George Barrett f09c2cefdd glsl: Catch subscripted calls to undeclared subroutines
generate_array_index fails to check whether the target of a subroutine
call exists in the AST, potentially passing around null ir_rvalue
pointers eventuating in abort/segfault.

Fixes: fd01840c0b ("glsl: add AoA support to subroutines")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100438
2017-11-20 11:04:04 +11:00
Eric Anholt 514db90448 broadcom/vc5: Fix up integer texture handling.
The original spec I had didn't expose integer textures and suggested that
you use unfiltered floats.  Now there are proper formats for them.

Fixes 16- and 32-bit texwrap integer tests in piglit, and
dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0.rgb10_a2ui.
2017-11-19 10:12:30 -08:00
Eric Anholt 65ae4527d9 broadcom/vc5: Fix simulator assertion failures about color RT clears.
When we tried to clear color while storing depth, it assertion failed
about basically not having enough information to decide which color RT to
clear.  It turns out the STORE_GENERAL picks the buffer according to the
color buffer being stored, or all of them if NONE.  If you're doing depth,
it doesn't know which to pick.
2017-11-19 10:12:30 -08:00
Rob Clark ae44845aff freedreno/ir3: add texture gather support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-18 13:39:39 -05:00
Lucas Stach f5d477f447 etnaviv: enable full overwrite when no color buffer is present
The OVERWRITE bit disables destination fetches, which is exactly what
we want when there is no valid color buffer bound.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-11-18 12:33:49 +01:00
Jason Ekstrand 1eab327ba7 i965: Stop including brw_cfg.h in brw_disasm_info.h
The brw_disasm_info header is included by certain tools in order to get
shader assembly from binaries so it's a semi-external header.  Including
brw_cfg.h also pulls in brw_shader.h so you end up getting quite a bit
of our back-end compiler internals.  Instead, make the couple of forward
declarations we need and make the header more stand-alone.  This fixes
the meson build.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 4f82b17287
2017-11-17 21:51:16 -08:00
Jason Ekstrand 0a6a137eb2 i965: Mark BOs as external when we export their handle
Almost all of our BO export paths were already properly marked the BO as
external and added it to the handle table.  Most export use-cases go
through a prime fd or flink where we have a brw_bo export helper that
does the right thing.  The one missing one happens when you call
queryImage and ask for __DRI_IMAGE_ATTRIB_HANDLE.  We just grabbed the
gem handle out of the BO (because it's really easy to do that) and
handed it off to the client; what could go wrong?  As it turns out, this
path is used by basically every compositor that wants to turn around and
call drmModeAddFB2 on it so it can hand it off to display.  The result,
as of 4b1e70cc57, is that we no longer set
MOCS_PTE on those surfaces and the kernel's attempts to disable caching
fail and we scanout gets corruption.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103759
Fixes: 4b1e70cc57
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2017-11-17 17:16:44 -08:00
Jason Ekstrand 344252a27f i965/bufmgr: Add a helper to mark a BO as external
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2017-11-17 17:16:44 -08:00
Andres Gomez 1866f7aee5 i965: Correct disasm_info usage in eu_validate test
Fixes: 4f82b17287 ("i965: Rewrite disassembly annotation code")

Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-11-18 03:07:06 +02:00