Commit Graph

117683 Commits

Author SHA1 Message Date
Jonathan Marek e1a86bd634 etnaviv: blt: use only for tiling, and add missing formats
* Removes the incorrect usage of translate_rs_format
* Disables use of BLT engine for different src/dst format

We only really need the BLT engine for tiling/detiling right now, but it
would be nice to support as many blit cases as possible to avoid using PE
for that.

To deal with different formats we need to:
 * Have a translate_blt_format which has all supported formats
 * Fix the swizzle translation from gallium (current version was wrong)
 * Set the src/dst sRGB bits as needed
 * Find which type conversions the BLT engine can actually do

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-18 20:57:40 +01:00
Brian Paul 02c3dad0f3 Call shmget() with permission 0600 instead of 0777
A security advisory (TALOS-2019-0857/CVE-2019-5068) found that
creating shared memory regions with permission mode 0777 could allow
any user to access that memory.  Several Mesa drivers use shared-
memory XImages to implement back buffers for improved performance.

This path changes the shmget() calls to use 0600 (user r/w).

Tested with legacy Xlib driver and llvmpipe.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-18 12:28:59 -07:00
Jason Ekstrand fdaf8144a8 anv: Emit a NULL vertex for zero base_vertex/instance
If both are zero (the common case), we can emit a null vertex buffer
rather than emitting a vertex buffer with zeros in it.  The packing of
the VERTEX_BUFFER_STATE is faster because no relocation is emitted and
we can avoid creating the vertex buffer which means one less
anv_state_stream_alloc.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand bc9d7836bc anv: Use an anv_state for the next binding table
This is a bit more natural because we're already getting an anv_state
most places in the pipeline.  The important part here, however, is that
we're no longer calling anv_block_pool_map on every alloc_binding_table
call.  While it's probably pretty cheap, it is potentially a linear walk
over the list of BOs and it was showing up in profiles.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand 98dc179c1e anv: More carefully dirty state in BindPipeline
Instead of blindly dirtying descriptors and push constants the moment we
see a pipeline change, check to see if it actually changes the bind
layout or push constant layout.  This doubles the runtime performance of
one CPU-limited example running with the Dawn WebGPU implementation when
running on my laptop.

NOTE: This effectively reverts beca63c6c0.  While it was a nice
optimization, it was based on prog_data and we can't do that anymore
once we start allowing the same binding table to be used with multiple
different pipelines.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand 22f16ff54a anv: More carefully dirty state in BindDescriptorSets
Instead of dirtying all graphics or all compute based on binding point,
we're now much more careful.  We first check to see if the actual
descriptor set changed and then only dirty the stages used by that
descriptor set.  For dynamic offsets, we keep a bitfield per-stage of
which offsets are actually used in that stage and we only dirty push
constants and descriptors if that stage has dynamic offsets AND those
offsets actually change.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand ca8117b5d5 anv: Use a switch statement for binding table setup
It theoretically could be more efficient but the real point here is that
it's no longer really a matter of dealing with special cases and then
the "real" thing.  The way we're handling binding tables, it's more of a
multi-step process and a switch is more natural.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand 9baa33cef0 anv: Rework push constant handling
This substantially reworks both the state setup side of push constant
handling and the pipeline compile side.  The fundamental change here is
that we're no longer respecting the prog_data::param array and instead
are just instructing the back-end compiler to leave the array alone.
This makes the state setup side substantially simpler because we can now
just memcpy the whole block of push constants and don't have to
upload one DWORD at a time.

This also means that we can compute the full push constant layout
up-front and just trust the back-end compiler to not mess with it.
Maybe one day we'll decide that the back-end compiler can do useful
things there again but for now, this is functionally no different from
what we had before this commit and makes the NIR handling cleaner.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand ca91ab8015 anv: Re-arrange push constant data a bit
This moves the compute stuff into a anv_push_constants::cs sub-struct.
It also moves dynamic offsets into the push constants.  This means we
have to duplicate the data per-stage but that doesn't seem like the end
of the world and one day we may wish to make dynamic offsets per-stage
anyway.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand d1c4e64a69 intel/compiler: Add a flag to avoid compacting push constants
In vec4, we can just not run the pass.  In fs, things are a bit more
deeply intertwined.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand aecde23519 anv: Pre-compute push ranges for graphics pipelines
It turns off that emitting push constants is one of the hottest paths in
the driver and ANY work we do there costs us.  By pre-computing things a
bit ahead of time, we shave 5% off the runtime of a CPU-limited example
running with the Dawn WebGPU implementation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand 4b392ced2d anv: Stop bounds-checking pushed UBOs
The bounds checking is actually less safe than just pushing the data.
If the bounds checking actually ever kicks in and it's not on the last
UBO push range, then the shrinking will cause all subsequent ranges to
be pushed to the wrong place in the GRF.  One of the behaviors we
definitely don't want is for OOB UBO access to result in completely
unrelated UBOs returning garbage values.  It's safer to just push the
UBOs as-requested.  If we're really concerned about robustness, we can
emit shader code to do bounds checking which should be stupid cheap (a
CMP followed by SEL).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand ebad00d9e7 anv: Delete dead shader constant pushing code
As of 2d78e55a8c, nir_intrinsic_load_constant with a constant offset
is constant-folded so we should never end up with any that trigger
brw_nir_analyze_ubo_ranges.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand 0709c0f6b4 anv: Flatten descriptor bindings in anv_nir_apply_pipeline_layout
This lets us stop tracking the pipeline layout.  It also means less
indirection on a very hot path.  As an extra bonus, we can make some of
our data structures smaller.  No measurable CPU overhead improvement.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand fa120cb31c anv: Input attachments are always single-plane
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand 0a02f2a278 genxml: Mark everything in genX_pack.h always_inline
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand abfd4651ed anv/pipeline: Assume layout != NULL
In the early days of the driver we allowed layout to be VK_NULL_HANDLE
and used that for some internal pipelines when we wanted to be lazy.
Vulkan doesn't actually allow NULL layouts, however, so there's no
reason to have this check.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Italo Nicola 59623f211b intel/compiler: remove old comment
This comment was correct some time ago, but since commit
d3c10ad427, it isn't true anymore.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-11-18 10:20:34 -08:00
Alyssa Rosenzweig 3663340049 pan/midgard: Use shader stage in mir_op_computes_derivative
A 'normal' texture op may be emitted in a vertex shader on T720 but it
still doesn't take any derivatives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-18 08:48:54 -05:00
Danylo Piliaiev 6f17fe0606 i965: Unify CC_STATE and BLEND_STATE atoms on Haswell as a workaround
Re-emitting 3DSTATE_CC_STATE_POINTERS after emitting
3DSTATE_BLEND_STATE_POINTERS fixes the shadow flickering in
SuperTuxCart and Tropico 6 which was seen only on Haswell.
The reason for this is unknown and fix was found empirically.

The closest mention in PRM is that it should improve performance.
From the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS):
 "When the BLEND_STATE pointer changes but not the CC_STATE pointer,
  driver needs to force a CC_STATE pointer change to improve
  blend performance in pixel backend."

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1834
Fixes: eca4a654 ("i965: Disable dual source blending when shader doesn't support it on gen8+")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-11-18 11:00:23 +02:00
Samuel Pitoiset 1ebd9459e7 radv: implement VK_AMD_device_coherent_memory
This extension adds the device coherent and device uncached memory
types. It's known to be slower than non-device coherent memory but
it might be useful for debugging.

This is only exposed for chips that support L2 uncached.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-18 08:20:19 +00:00
Samuel Pitoiset 2af7511ed2 ac: add radeon_info::has_l2_uncached
For chips that have uncached device memory (ie. MTYPE_UC).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-18 08:20:19 +00:00
Pierre-Eric Pelloux-Prayer 3c9ea6bdfd radeonsi: enable mesa_glthread for GfxBench
It improves offscreen tests performance.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-11-18 09:16:18 +01:00
Alyssa Rosenzweig bc9a7d0699 pan/midgard: Represent ld/st offset unpacked
This simplifies manipulation of the offsets dramatically, fixing some
UBO access related bugs.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-17 22:19:31 -05:00
Alyssa Rosenzweig 1798f6bfc3 pan/midgard: Fix masks/alignment for 64-bit loads
These need to be handled with special care.

Oh, Midgard, you're *extra* special.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-17 22:19:31 -05:00
Alyssa Rosenzweig 34a860b9e3 pan/midgard: Expose more typesize helpers
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-17 21:30:14 -05:00
Alyssa Rosenzweig 2236904f72 pan/midgard: Implement non-aligned UBOs
The field is more fine-grained than we had assumed.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-17 21:18:45 -05:00
Christian Gmeiner ee3ad0fad2 etnaviv: rs: upsampling is not supported
This change makes it possible to support different downsample cases
like 4 -> 2 or 4 -> 1.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-11-17 18:42:31 +00:00
Jonathan Marek 75e58d1fae freedreno/registers: fix a6xx_2d_blit_cntl ROTATE
A change from b7093882 got overwritten by 610c8c93

Fixes: 610c8c93 ("freedreno/registers: Update with GS, HS and DS registers")

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-17 17:40:53 +00:00
Jonathan Marek 0f5743429c freedreno/ir3: disable texture prefetch for 1d array textures
Prefetch only supports the basic 2D texture case, checking is_array is
needed because 1d array textures pass the coord num_components==2 test.

Fixes: 2a0d45ae ("freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch")

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-17 17:01:18 +00:00
Andreas Baierl ef9635d0bc lima: Parse VS and PLBU command stream while making a dump
This makes the streams more readable and comparable with the blob's parser
as it parses the VS and PLBU stream and shows the currently known values.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-11-17 05:39:17 +00:00
Andreas Baierl c76eb7ea84 lima: Beautify stream dumps
Change the dump, that the output looks more like the output of
mali-syscall-tracker [1].
This is a preparation for a more detailed stream analysis.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>

[1]: https://gitlab.freedesktop.org/lima/mali-syscall-tracker
2019-11-17 05:39:17 +00:00
Aaron Watry 3b3494174d clover/llvm: fix build after llvm 10 commit 1dfede3122ee
CodeGenFileType moved from ::llvm::TargetMachine in
llvm/Target/TargetMachine.h to ::llvm:: in llvm/Support/CodeGen.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-11-15 22:54:31 -06:00
Mauro Rossi 09ab297e9f android: util/format: fix include path list
To avoid following building error:

out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_util_intermediates/format/u_format_table.c:30:10:
fatal error: 'u_format.h' file not found
         ^~~~~~~~~~~~
1 error generated.

Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2019-11-16 00:06:31 +01:00
Mauro Rossi 3cd522c70a android: radeonsi: fix build error due to wrong u_format.csv file path
GEN10_FORMAT_TABLE_INPUTS requires correction of u_format.csv file path
in order to avoid following build error:

ninja: error: 'external/mesa/util/format/u_format.csv',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_radeonsi_intermediates/radeonsi/gfx10_format_table.h',
missing and no known rule to make it

Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2019-11-15 23:20:03 +01:00
Eric Anholt b30589cbd3 mesa/st: Reuse st_choose_matching_format from st_choose_format().
We had this ad-hoc exact size matching for unsized internalformats,
but st_choose_matching_format() can do exactly what we want.  This
means, that, for example, we'll now prefer the matching ordering for
565/565_REV if the driver supports both orders.  We also pass
Unpack.SwapBytes through from ChooseTextureFormat so that we can hit
the memcpy path for 8888 formats when that flag is set.

Some interesting format choice changes from this (on softpipe):
intf/form/type        before            after
----------------------------------------------------
RGBA/RGBA/USHORT:     R8G8B8A8_UNORM -> RGBA_UNORM16
RGB/RGBA/8888:        X8B8G8R8_UNORM -> R8G8B8X8_UNORM
RGB/ABGR/8888_REV:    X8B8G8R8_UNORM -> R8G8B8X8_UNORM
RGBA/RGBA/5551:       B5G5R5A1_UNORM -> A1B5G5R5_UNORM
RGBA/RGBA/4444:       R8G8B8A8_UNORM -> A4B4G4R4_UNORM
RGBA/GL_RGBA/1010102: R8G8B8A8_UNORM -> A2B10G10R10_UNORM
DEPTH/DEPTH/UINT:     Z24X8          -> Z_UNORM32
DEPTH/DEPTH/USHORT:   Z24X8          -> Z_UNORM16

v2: Make sure that the baseformat still matches.  v1 would pick
    MESA_FORMAT_L16_UNORM for RED/LUMINANCE/SHORT, when we clearly
    want a red format.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-11-15 20:32:17 +00:00
Eric Anholt bc2b14a4a3 mesa: Don't put sRGB formats in the array format table.
sRGB vs unorm was the only conflict case being guarded against in this
function.  Before the PIPE_FORMAT conversion, we always listed the
unorm before the sRGB in the enums, but PIPE_FORMAT_A8B8G8R8_SRGB
happens to be before _UNORM.  We always want the unorm result here.

Fixes: 807a800d8c ("mesa: Redefine MESA_FORMAT_* in terms of PIPE_FORMAT_*.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-11-15 20:32:17 +00:00
Eric Anholt e5b06008f1 mesa/st: Simplify st_choose_matching_format().
We now have a nice helper function for finding those memcpy formats,
without needing to go through each entry of the mesa format table to
see if it happens to match.

While looking at sysprof of a softpipe GLES2 CTS run, we were spending
~8% of the CPU on ChooseTextureFormat.  With this, roughly the same
region of the testsuite was .4%.

v2: Add Ken's fix for canonicalizing array formats.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-11-15 20:32:17 +00:00
Kenneth Graunke 69f109cc37 mesa: Handle GL_COLOR_INDEX in _mesa_format_from_format_and_type().
Just return MESA_FORMAT_NONE to avoid triggering unreachable; there's
really no sensible thing to return for this case anyway.

This prevents regressions in the next commit, which makes st/mesa
start using this function to find a reasonable format from GL format
and type enums.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-15 20:32:17 +00:00
Alyssa Rosenzweig ea232c7cfd pan/midgard: Use generic constant packing for 8/64-bit
Eventually, we will want to combine constants across types, but for now
let's not break the world.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-15 20:08:46 +00:00
Alyssa Rosenzweig 4c182a6d11 pan/midgard: Pack 64-bit swizzles
64-bit ops have their own funky swizzles. Let's pack them, both for
native 64-bit sources as well as extended 32-bit sources.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-15 20:08:46 +00:00
Alyssa Rosenzweig ba2fb98d36 pan/midgard: Fix mir_round_bytemask_down for !32b
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-15 20:08:46 +00:00
Alyssa Rosenzweig 2655a300a3 pan/midgard: Implement i2i64 and u2u64
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-15 20:08:46 +00:00
Alyssa Rosenzweig 855eec93b1 pan/midgard: Expand 64-bit writemasks
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-15 20:08:46 +00:00
Marek Olšák bda3ec5d55 radeonsi/nir: don't lower fma, instead, fuse fma
We want fma. This decreases compile times by 4% for Borderlands 2.

48505 shaders in 30515 tests
Totals:
SGPRS: 2206584 -> 2204784 (-0.08 %)
VGPRS: 1647892 -> 1648964 (0.07 %)
Spilled SGPRs: 6256 -> 6078 (-2.85 %)
Spilled VGPRs: 72 -> 72 (0.00 %)
Private memory VGPRs: 2176 -> 2176 (0.00 %)
Scratch size: 2240 -> 2240 (0.00 %) dwords per thread
Code Size: 49680804 -> 49837988 (0.32 %) bytes
LDS: 74 -> 74 (0.00 %) blocks
Max Waves: 371387 -> 371352 (-0.01 %)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-15 14:34:49 -05:00
Marek Olšák dec34e880d radeonsi/nir: call nir_lower_flrp only once per shader
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-15 14:34:49 -05:00
Marek Olšák 0714b3d57e radeonsi/nir: remove dead function temps
glxgears has dead temps after lowering color inputs to load intrinsics.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-15 14:34:49 -05:00
Marek Olšák bc5097a7d9 gallium/noop: call finalize_nir
For measuring st/mesa compile time.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-15 14:34:49 -05:00
Tomeu Vizoso 27801b90fa panfrost: Make sure the shader descriptor is in sync with the GL state
State was leaking from previous frames as we weren't updating the
descriptor in all cases.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
2019-11-15 18:37:34 +00:00
Alyssa Rosenzweig 095654e3c2 pan/midgard: Prioritize texture registers
On newer GPUs, this is a no-op. On older GPUs, this prevents needless
spilling since texture registers are shared with a subset of work
registers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
2019-11-15 18:37:34 +00:00