Commit Graph

111105 Commits

Author SHA1 Message Date
Marek Olšák 9624855f13 radeonsi: add threadgroups_per_cu param into si_get_compute_resource_limits
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:54 -04:00
Marek Olšák 6e38af0631 radeonsi: move si_*_descriptors_idx functions into si_state.h
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:53 -04:00
Marek Olšák 49a016ec5d radeonsi: make si_initialize_compute reusable
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:51 -04:00
Marek Olšák c44c6951d4 radeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helper
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:49 -04:00
Marek Olšák c7ceeea093 radeonsi: return the last part's return value from @wrapper
The primitive discard compute shader will get the position output this way.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:40 -04:00
Marek Olšák d569b7cb31 winsys/amdgpu: always set NO_CPU_ACCESS and NO_SUBALLOC on GDS resources
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:18 -04:00
Jan Zielinski d65b160e6a swr: clean up supported OGL4.0/4.1 extensions list
This commit adjusts the capabilities returned
by the SWR driver and the documentation to correctly
report the following extensions:

GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array,
GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather,
GL_ARB_vertex_attrib_64bit.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-05-16 17:41:14 +02:00
Leo Liu aa040d3b3c vl/dri3: set back buffer from output to NULL with front buffer case
Since the using output optimization is only for back buffer case

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-16 10:28:38 -04:00
Alejandro Piñeiro 16a1ef7860 docs: advice to resolve discussion on gitlab MR doc
For newcomers to gitlab, it is not evident that it is better to press
the "Resolve Discussion" button when you update your branch handling
feedback.

v2:
   * Fix several grammar nits, reorder, use new corrected text (Connor
     Abbot)
   * Use "reviewers", instead of reviewer (Eric Engestrom)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-16 16:16:32 +02:00
Roland Scheidegger 4171a26193 auxiliary/draw: fix crash with zero-stride draw auto
transform feedback draws get the number of vertices from the transform
feedback object. In draw, we'll figure this out with the number of bytes
written divided by the stride. However, it is apparently possible we end
up with a stride of 0 there (not entirely sure it could happen with GL).
Probably when nothing was actually ever written (so we don't actually
have a stride set). Just avoid the division by zero by setting the count
to 0.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-05-16 14:01:33 +02:00
Eric Engestrom 22c1657d05 util/os_file: always use the 'grow' mechanism
Use fstat() only to pre-allocate a big enough buffer.

This fixes a race where if the file grows between fstat() and read()
we would be missing the end of the file, and if the file slims down
read() would just fail.

Fixes: 316964709e "util: add os_read_file() helper"
Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-16 12:56:25 +01:00
Lionel Landwerlin e04cf0b612 nir: lower_non_uniform_access: iterate over instructions safely
This pass moves instructions around and adds control-flow in the
middle of blocks. We need to use nir_foreach_instr_safe to ensure that
we iterate over instructions correctly anyway.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bd5457641 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-16 10:22:01 +01:00
Kenneth Graunke 752367b766 iris: Dodge more GLSL IR lowering
This avoids some lower_instructions bits in st.
2019-05-15 19:44:21 -07:00
Jason Ekstrand fce0214e94 intel/fs/live_variables: Do compute_start_end in BITSET_WORD chunks
For a block with a contiguous chunk of 32 vars that don't need updating,
this lets us skip 32 vars at a time. Also, by using bitscan, we only
iterate for each set bit rather than testing them all one at a time.
Looking at perf (with -O0 which is unfortunately necessary to get
reasonable back-traces), this seems to cuts about 50-60% of the time
spent in compute_start_end() which is, itself about 4-6% of the
run-time. In the real world, with a release driver build, this cuts
1.34% off a full shader-db run. (I ran shader-db 5 times in each
configuration).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-16 02:14:40 +00:00
Jason Ekstrand b2d274c677 intel/fs/ra: Choose a spill reg before throwing away the graph
Otherwise, we get an effectively random spill reg because we no longer
have the information from RA to guide us.  Also, a completely clean
graph has undefined data in in_stack which is used for choosing the
spill reg so it really is non-deterministic.

Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand c19acf321c intel/fs/ra: Add spill costs to the graph on-demand
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand 2c14e2b5bf intel/fs/ra: Add a helper for discarding the interference graph
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Alyssa Rosenzweig 46494c3dc1 nir/algebraic: Remove problematic "optimization"
This line is no longer relevant now that booleans are 1-bit, and in fact
causes issues (infinite progress loop between algebraic optimizations
and copy prop) with constant vector masks.

No shader-db changes on Intel platforms (Jason).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2019-05-16 02:08:37 +00:00
Alyssa Rosenzweig 74ab80b92d panfrost/midgard: Add load/store opcodes
This commit adds a bunch of new load/store opcodes, largely related to
OpenCL, as well as adjusting the name of existing opcodes to be more
uniform. The immediate effect is compute shaders are substantially
easier to interpret now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:25:25 +00:00
Alyssa Rosenzweig f73c0b73ec panfrost/midgard: Enable integer constant inlining
Midgard ALU features two types of constants: embedded constants (128-bit
chunk, zero/one per schedule bundle) and inline constants (16-bit
splattered into the op, second source if present). Inline constants are
much more efficient from a space and scheduling freedom standpoint, so
it's desirable to inline when possible. Now that integer ops are well
understood and in use, we enable inlining of integers constants in
addition to floats (which have been inlined since forever).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:20:41 +00:00
Alyssa Rosenzweig 8214aaa3c8 panfrost/midgard: Remove imov workaround
The previous commit fixes the issue this patched around.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:20:41 +00:00
Alyssa Rosenzweig 0a13babdd8 panfrost/midgard: Set int outmod for ops writing integers
By default, the "normal" output modifier is set on ALU ops. This is the
correct default for float outputs -- for floats, it preserves the semantic
value. Unfortunately, when used with integers, it does not preserve the
bitstream encoding, causing misbehaviour. (It's an open question what
happens when `normal` is used with integers -- does it apply some other
transformation? or does it do floating point normalization/etc on the
ints as if they were floats?).

Instead, we default to the "clamp to integer" output modifier for
ops writing integers. Semantically, this makes sense (clamping an
integer to the nearest integer is the identity function). In the
hardware with an integer opcode, this is the actual "normal".

This fixes numerous sporadic and sometimes bizarre bugs relating to
integers, especially integer moves. With this in place, we no longer
care about the types involved; it's just bits on the wire again.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:20:30 +00:00
Alyssa Rosenzweig 81b1053d9b panfrost: Set custom stride for textures when necessary
From Gallium (and our) perspective, the stride of a BO is arbitrary. For
internal buffers, we can make it something nice, but for imported linear
buffers (e.g. EGL clients), we don't always have that luxury. To cope,
we calculate the expected stride of a texture, compare it to the BO's
actual reported stride, and if they differ, set the latter as a custom
stride.

Fixes rendering of windows not on tile boundaries (noticeable in Weston
with es2gears_wayland, for instance). Also, this should fix stride
issues with bufer reloading.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:16:36 +00:00
Alyssa Rosenzweig cea9352059 panfrost/decode: Stride decoding
With a special flag, texture descriptors can include custom stride(s).
We haven't seen a case of this used for mipmaps/cubemaps, so it's not
clear how that will be encoded, but this dumps correctly for single
one-level 2D textures.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:15:37 +00:00
Alyssa Rosenzweig d699ffbf0e panfrost/decode: Futureproof texture dumping
One field was not dumped for some reason. It's observed to be 0, but
it's still good to have it available.

Also, extra fields might be snuck in the bitmaps array (it's
variable-lengthed at the end), and we want to guard against that
possibility, so we dump a little more.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:15:37 +00:00
Marek Olšák ccfcb9d818 ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>

We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.

It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
2019-05-15 20:54:10 -04:00
Marek Olšák e5cc363f43 ac: add comments to chip enums
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (except GFX2 changes)
Reviewed-by: Dave Airlie <airlied@redhat.com> (except <= GFX5 changes)
2019-05-15 20:54:10 -04:00
Anuj Phogat a42163cbbc compiler: Add lowering support for 64-bit saturate operations to software
Fixes 7 Khronos GL CTS tests:
KHR-GL45.gpu_shader_fp64.builtin.smoothstep_dvec{double, 2, 3, 4}
KHR-GL45.gpu_shader_fp64.builtin.smoothstep_against_scalar_dvec{2, 3, 4}

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-15 23:30:30 +00:00
Kenneth Graunke d305409db5 st/dri: Minor style fixes
Trivial.
2019-05-15 14:49:14 -07:00
Chia-I Wu 659c5800e5 virgl: handle DONT_BLOCK and MAP_DIRECTLY
Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY.
Make virgl_resource_transfer_prepare return an enum instead of a
bool for extensibility (e.g., instruct the callers to map
differently).

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-15 20:51:28 +00:00
Chia-I Wu e87186fc67 virgl: add virgl_resource_transfer_prepare
virgl_resource_transfer_prepare should be called before mapping to
prepare the resource.  It does flush, readback, and wait as needed.
virgl_res_needs_flush and virgl_res_needs_readback become internal
helpers to the new function.

There should be no externally visible change.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-15 20:51:28 +00:00
Chia-I Wu cdcf38b98a virgl: honor DISCARD_WHOLE_RESOURCE in virgl_res_needs_readback
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-15 20:51:28 +00:00
Chia-I Wu a62ab178ce virgl: clean up virgl_res_needs_readback
Add comments and follow the coding style of virgl_res_needs_flush.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-15 20:51:28 +00:00
Lionel Landwerlin 391a836e8f nir: fix lower_non_uniform_access pass
Obviously missing the instruction insertion into the SSA list.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bd5457641 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-15 18:15:20 +00:00
Alex Villacís Lasso b2200514af gbm: gbm_bo_get_handle_for_plane fallback to nonplanar handle
Commit f9567ab435 (gbm: Export a getter for per
plane handles) contains an API version check that fails on i915 (API version 7
vs. check for minimum API version 13). Any client that migrates to the planar
API will start failing on i915 (see https://gitlab.gnome.org/GNOME/mutter/issues/127
for mutter, and https://bugs.freedesktop.org/show_bug.cgi?id=108487 for weston).

This commit adds a fallback for plane 0 when the API check fails and returns the
non-planar handle in this scenario, making the call equivalent to
gbm_bo_get_handle(). This is enough for weston 6.0.0 to start working again on an
i915 system.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108487

Signed-off-by: Alex Villacís Lasso <a_villacis@palosanto.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-05-15 18:27:30 +01:00
Alyssa Rosenzweig a9cef4f0e5 gallium: Add default check for PIPE_CAP_FRAGMENT_SHADER_INTERLOCK
Fixes: c704c0226 ("gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK")

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 21:34:49 -07:00
Andrii Kryvytskyi eca53f00aa iris: Check if resource has stencil before returning it
Signed-off-by: Andrii Kryvytskyi <andrii.o.kryvytskyi@globallogic.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 21:16:11 -07:00
Jordan Justen 49958c4b5d
i965/blorp: Set MOCS for gen11 in blorp_alloc_vertex_buffer
v2:
 * Add build error for gen > 6 if MOCS is not set. (Lionel)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-05-14 19:57:01 -07:00
Kenneth Graunke bb5db02bab iris: Enable fragment shader interlock on Gen9+.
There's some debate about whether we should support this on older
hardware as well.  Currently i965 turns it off on Gen8- though, so
we follow suit.  If this changes, we can update this as well.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-14 19:34:33 -07:00
Kenneth Graunke c704c0226c gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK.
Corresponding to GL_ARB_fragment_shader_interlock and
GL_NV_fragment_shader_interlock.  Currently, only the NIR paths
support this functionality, but someone could conceivably add it
to TGSI too.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-14 19:34:29 -07:00
Dave Airlie 4efd04ab18 intel/compiler: use bitset instead of opencoding a 32-bit bitset. (v2)
In the future I want to expand this to 128-bits, for vec16 support, so
lets just put the code in place to use bitset ranges now.

v2: just declare the bitset to be the max of what we should ever see
and change assert to reflect it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-15 07:10:34 +10:00
Dave Airlie 3b2c433167 intel/compiler: remove repeated bit_size / 8 in brw mem lowering pass.
Just use a variable already.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-15 07:10:30 +10:00
Kenneth Graunke 646924cfa1 intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8
Our tessellation control shaders can be dispatched in several modes.

- SINGLE_PATCH (Gen7+) processes a single patch per thread, with each
  channel corresponding to a different patch vertex.  PATCHLIST_N will
  launch (N / 8) threads.  If N is less than 8, some channels will be
  disabled, leaving some untapped hardware capabilities.  Conditionals
  based on gl_InvocationID are non-uniform, which means that they'll
  often have to execute both paths.  However, if there are fewer than
  8 vertices, all invocations will happen within a single thread, so
  barriers can become no-ops, which is nice.  We also burn a maximum
  of 4 registers for ICP handles, so we can compile without regard for
  the value of N.  It also works in all cases.

- DUAL_PATCH mode processes up to two patches at a time, where the first
  four channels come from patch 1, and the second group of four come
  from patch 2.  This tries to provide better EU utilization for small
  patches (N <= 4).  It cannot be used in all cases.

- 8_PATCH mode processes 8 patches at a time, with a thread launched per
  vertex in the patch.  Each channel corresponds to the same vertex, but
  in each of the 8 patches.  This utilizes all channels even for small
  patches.  It also makes conditions on gl_InvocationID uniform, leading
  to proper jumps.  Barriers, unfortunately, become real.  Worse, for
  PATCHLIST_N, the thread payload burns N registers for ICP handles.
  This can burn up to 32 registers, or 1/4 of our register file, for
  URB handles.  For Vulkan (and DX), we know the number of vertices at
  compile time, so we can limit the amount of waste.  In GL, the patch
  dimension is dynamic state, so we either would have to waste all 32
  (not reasonable) or guess (badly) and recompile.  This is unfortunate.
  Because we can only spawn 16 thread instances, we can only use this
  mode for PATCHLIST_16 and smaller.  The rest must use SINGLE_PATCH.

This patch implements the new 8_PATCH TCS mode, but leaves us using
SINGLE_PATCH by default.  A new INTEL_DEBUG=tcs8 flag will switch to
using 8_PATCH mode for testing and benchmarking purposes.  We may
want to consider using 8_PATCH mode in Vulkan in some cases.

The data I've seen shows that 8_PATCH mode can be more efficient in
some cases, but SINGLE_PATCH mode (the one we use today) is faster
in other cases.  Ultimately, the TES matters much more than the TCS
for performance, so the decision may not matter much.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:30 -07:00
Kenneth Graunke 076159b40b intel/compiler: Move ICP handle fetching into a helper function.
This will be significantly different in 8_PATCH mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:28 -07:00
Kenneth Graunke 3d84fd29e8 intel/compiler: Don't repeat dispatch max fixing condition
Having a single flag will keep both places in sync if the condition
gets more complicated.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:27 -07:00
Kenneth Graunke f0d52cf2b0 intel/compiler: Rename invocation_id_mask to instance_id_mask
The payload field is actually "instance" (thread number), which is used
to calculate the invocation ID.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:25 -07:00
Kenneth Graunke d86260719e intel/compiler: Refactor TCS invocation ID setup into a helper
When we add 8_PATCH mode, this will get a bit more complex, so we may
as well start by putting it in a helper function.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:24 -07:00
Kenneth Graunke 381c2aded2 i965: Pass compiler to default key populators
This lets us get devinfo and other misc. compiler settings.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:21 -07:00
Marek Olšák 6b0b8f132a ac: use 1D GEPs for descriptors and constants
just a cleanup

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-14 15:15:11 -04:00
Marek Olšák 67b4785958 mesa: fix _mesa_max_texture_levels for GL_TEXTURE_EXTERNAL_OES
This helps fix:
    piglit/bin/ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto

Fixes: d88f3392ff

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 15:15:11 -04:00