Commit Graph

111757 Commits

Author SHA1 Message Date
Nicolai Hähnle dc99a8cd9b radeonsi: cleanup some #includes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle 1ff2440eee amd/common: use ARRAY_SIZE for the LLVM command line options
This is more convenient for changing it around during debug.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle ca21ba2a08 radeonsi: inline si_shader_binary_read_config into its only caller
Since it can only be used for reading the config of an individual,
non-combined shader, it is not very reusable anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle bf8a1ca902 radeonsi: use the new run-time linker for shaders
v2:
- fix a memory leak

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle 16bee0e5f6 radeonsi: don't declare pointers to static strings
The compiler should be able to optimize them away, but still. There's
no point in declaring those as pointers, and if the compiler *doesn't*
optimize them away, they add unnecessary load-time relocations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle 3c958d924a amd/common: add ac_compile_module_to_elf
A new variant of ac_compile_module_to_binary that allows us to
keep the entire ELF around.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle 66da60f4da radeonsi: dump shader binary buffer contents
Help identify bugs related to corruption of shaders in memory,
or errors in shader upload / rtld.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle bf11c594dd radeonsi: return bool from si_shader_binary_upload
We didn't really use error codes anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle 8b1343ca79 radeonsi: let si_shader_create return a boolean
We didn't really use error codes anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle 77b05cc42d radeonsi: use ac_shader_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle b3be346c68 amd/common: add a more powerful runtime linker
Using an explicit linker instead of just concatenating .text
sections will allow us to start using .rodata sections and
explicit descriptions of data on LDS that is shared between
stages.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Caio Marcelo de Oliveira Filho 608257cf82 i965: Fix INTEL_DEBUG=bat
Use hash_table_u64 instead of hash_table directly, since the former
will also handle the special keys (deleted and freed) and allow use
the whole u64 space.

Fixes crash in INTEL_DEBUG=bat when using a key with value 0 -- the
current value for a freed key.

Fixes: b38dab101c "util/hash_table: Assert that keys are not reserved pointers"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-12 15:57:16 -07:00
Caio Marcelo de Oliveira Filho eb41ce1b01 util/hash_table: Properly handle the NULL key in hash_table_u64
The hash_table_u64 should support any uint64_t as input.  It does
special handling for the "deleted" key, storing the data in the table
itself; do the same for the "freed" key.

Fixes: b38dab101c "util/hash_table: Assert that keys are not reserved pointers"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-12 15:57:16 -07:00
Nicolai Hähnle c129cb3861 amd/common: clarify ac_shader_binary::lds_size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:33:21 -04:00
Nicolai Hähnle 2e96c01073 amd/common: extract ac_parse_shader_binary_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:33:08 -04:00
Nicolai Hähnle de8a919702 u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros
The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.

The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.

With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.

v2:
- ensure operations are no-op when allocation fails
- in util_dynarray_clone, call resize_bytes with a compile-time constant element size

v3:
- fix iris, lima, panfrost

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:30:25 -04:00
Nicolai Hähnle 71b45bae14 u_dynarray: return 0 on realloc failure and ensure no-op
We're not very good at handling out-of-memory conditions in general, but
this change at least gives the caller the option of handling it gracefully
and without memory leaks.

This happens to fix an error in out-of-memory handling in i965, which has
the following code in brw_bufmgr.c:

      node = util_dynarray_grow(vma_list, sizeof(struct vma_bucket_node));
      if (unlikely(!node))
         return 0ull;

Previously, allocation failure for util_dynarray_grow wouldn't actually
return NULL when the dynarray was previously non-empty.

v2:
- make util_dynarray_ensure_cap a no-op on failure, add MUST_CHECK attribute
- simplify the new capacity calculation: aside from avoiding a useless loop
  when newcap is very large, this also avoids an infinite loop when newcap
  is larger than 1 << 31

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:30:25 -04:00
Nicolai Hähnle dc75362511 freedreno: use util_dynarray_clear instead of util_dynarray_resize(_, 0)
This is more expressive and simplifies a subsequent change.

v2:
- fix one more call-site after rebase

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:30:25 -04:00
Alyssa Rosenzweig 1ee2366693 panfrost/midgard: Differentiate vertex/fragment texture tags
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:32:12 -07:00
Alyssa Rosenzweig 5062b612be panfrost/midgard: Assert on unknown texture source
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:32:07 -07:00
Alyssa Rosenzweig 4ea512844c panfrost/midgard: Set minimal swizzle on texture input
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:32:01 -07:00
Alyssa Rosenzweig 6ae4f9c523 panfrost/midgard: Lower texture projectors
We do have native support for perspective division on the load/store
unit, but this is for the future, something ideally we would select
generally, not just for textures. Meanwhile, flipping on projector
lowering works now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:31:53 -07:00
Alyssa Rosenzweig 4012e06788 panfrost/midgard: Implement txl
This follows the txb implementation, but requires an adjustment to how
the cont/last flags are set.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:31:45 -07:00
Alyssa Rosenzweig a19ca344ab panfrost/midgard: Implement txb op
We refactor the main tex handling to fit a bias argument in as well.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:31:37 -07:00
Alyssa Rosenzweig 1acffb5671 panfrost: Unify bind_vs/fs_state
This replaces bind_vs/fs_state calls to a unified bind_shader_state
call, removing a great deal of duplicated logic related to variant
selection.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:29:05 -07:00
Alyssa Rosenzweig f8a4090f80 panfrost: Add panfrost_job_type_for_pipe helper
This logic is repeated in a bunch of places and will only grow worse as
we support more job types; collect it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:27:47 -07:00
Alyssa Rosenzweig 15fae1e38c panfrost/midgard: Extract emit_varying_read
Paralleling emit_uniform_read, this allows varying reads to be emitted
independent of an honest-to-goodness load vary instruction in the NIR.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:25:17 -07:00
Alyssa Rosenzweig 8c88bd0253 panfrost: Remove "vertex/tiler render target" silliness
I don't think these are actual structures, just figments over
cargoculting dumped memory without making any sense of it. Nothing seems
to break if the region is zeroed out, anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:21:56 -07:00
Alyssa Rosenzweig b96df80069 panfrost/decode: Print line number of bad memory access
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:14:53 -07:00
Alyssa Rosenzweig fc7bcee865 panfrost: Replace pantrace with direct decoding
History lesson! In the early days of a Panfrost, we had a library
independent of the driver called `panwrap` which would be LD_PRELOAD'ed
into a driver to decode its cmdstream in real-time. When upstreaming
Panfrost, we realized that we would much rather have this decode
functionality maintained in-tree to avoid divergence, but that we could
not upstream panwrap because of its use with the legacy API. So we
instead dumped GPU memory to the filesystem with an out-of-tree panwrap,
and decoded that with the in-tree pandecode module. When we migrated to
the new kernel, we just added support for doing this memory dump
directly from the driver (via a module "pantrace").

This works, but dumping memory every frame is sloooooooooooooow and
error-prone. I figured if we have pandecode in-tree, we might as well
link to it directly in the driver, allowing us to decode Panfrost's
command streams without dumping memory to the filesystem first. This
cleans up the code *substantially* and improves dumping performance by a
HUGE margin. I'm talking "several seconds per frame" to "dumping in
real-time" kind of jump.

Note to users: this removes the environmental option "PANTRACE_BASE".
Instead, for equivalent functionality set "PAN_MESA_DEBUG=trace" and
redirect stdout to the file of your choosing.

This should be debugging Panfrost much more pleasant.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:07:09 -07:00
Kevin Strasser 845ec8576a st/mesa: Add rgbx handling for fp formats
Add missing cases for fp32 and fp16 formats.

Fixes: c68334ffc0 "st/mesa: add floating point formats in st_new_renderbuffer_fb()"
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-12 19:03:47 +00:00
Kevin Strasser ec0a68e50d gallium/winsys/kms: Fix dumb buffer bpp
The bpp in the dumb buffer creation request is hardcoded to 32, which is an
incorrect assumption as the caller is free to pick any pipe format. Use the
bpp supplied to us through util_format_get_blocksizebits().

Fixes: 3b176c441b "gallium: Add a dumb drm/kms winsys backed swrast provider"
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-12 11:44:10 -07:00
Eric Engestrom 9996ddbb27 util/futex: fix dangling pointer use
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110901
Fixes: 7dc2f47882 "util: emulate futex on FreeBSD using umtx"
Cc: Greg V <greg@unrelenting.technology>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-12 17:27:44 +01:00
Samuel Pitoiset d378151246 radv: fix VK_EXT_memory_budget if one heap isn't available
When the visible VRAM size is equal to the VRAM size only two
heaps are exposed.

This fixes dEQP-VK.api.info.device.memory_budget.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-12 15:52:48 +02:00
Samuel Pitoiset 2ef9d2738c radv: fix occlusion queries on VegaM
The number of render backends is 16 but the enabled mask is 0xaaaa.

As noticed by Bas, allowing disabled render backends might break
the OCCLUSION_QUERY packet. We don't use it yet but keep this in
mind.

This fixes dEQP-VK.query_pool.* and dEQP-VK.multiview.*.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-12 15:51:12 +02:00
Lionel Landwerlin 93b93e5a9d anv: do not parse genxml data without INTEL_DEBUG=bat
This significantly slows down the CTS runs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 32ffd90002 ("anv: add support for INTEL_DEBUG=bat")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-06-12 12:53:35 +03:00
Lionel Landwerlin f80679c8e8 intel/dump: fix segfault when the app hasn't accessed the device
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-12 09:49:55 +03:00
Caio Marcelo de Oliveira Filho 48d7e7a9b8 iris: Only upload surface state for grid info when needed
Special care is needed to ensure that when we have two consecutive
calls with the same grid size, we only bail in the second one if it
either don't need the surface state or the surface state was already
uploaded.

v2: Instead of having a new bool in ice->state to know whether we had
    a surface, check whether we have state->ref.  (Ken)
    Clean up the logic a little bit by adding 'grid_updated' local. (Ken)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 17:57:37 -07:00
Caio Marcelo de Oliveira Filho f346b277d1 iris: Create binding table slot for num_work_groups only when needed
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 17:57:37 -07:00
Rui Salvaterra 7b43362f29 r300g: implement GLSL disk shader caching
This implements GLSL disk shader caching for the R300-R500 series of AMD GPUs.

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-11 20:49:34 -04:00
Richard Thier ffd2f948fe r300g: restore performance after RADEON_FLAG_NO_INTERPROCESS_SHARING was added
v1: Fix skipped slab allocators and the buffer cache.

v2: Use only 1 domain for texture allocation

v3: Added flag for the create_fence call too

Based on Marek v1 and v2 proposed fixes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=1107812.patch

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-11 20:45:27 -04:00
Marek Olšák ec0956a194 radeonsi: don't test SDMA perf if SDMA is disabled/unsupported 2019-06-11 20:05:21 -04:00
Marek Olšák 993bf52977 radeonsi: always interpolate PrimID as flat 2019-06-11 20:05:21 -04:00
Marek Olšák 7f7ffa0883 radeonsi: move color clamping to si_llvm_export_vs to unify the code 2019-06-11 20:05:21 -04:00
Marek Olšák 4773f5a293 radeonsi: use the ac helper for index buffer stores in the culling shader 2019-06-11 20:05:21 -04:00
Marek Olšák 579003e7bd radeonsi: use the ac helper for image stores 2019-06-11 20:05:21 -04:00
Marek Olšák deef3833f8 radeonsi: use the ac helper for SSBO stores 2019-06-11 20:05:21 -04:00
Marek Olšák e5fe38484a radeonsi: fixes for vec3 buffer stores in LLVM 9 2019-06-11 20:05:21 -04:00
Caio Marcelo de Oliveira Filho 9c81db8adb iris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
This avoids lowering of CS system values by GLSL (configured by state
tracker).  In i965 we don't use that lowering, and we also shouldn't
need that in Iris.

Using it cause some unnecessary round trip between values, e.g.:
shader uses gl_LocalInvocationIndex, GLSL rewrites it in terms of
gl_LocalInvocationID, then driver rewrites those in terms of
gl_LocalInvocationIndex again.  Copy propagation can make some of
those go away, but not all as seen below.

Intel SKL shader-db results:

    total instructions in shared programs: 15595189 -> 15594556 (<.01%)
    instructions in affected programs: 74880 -> 74247 (-0.85%)
    helped: 81
    HURT: 4
    helped stats (abs) min: 2 max: 172 x̄: 7.88 x̃: 4
    helped stats (rel) min: 0.19% max: 5.66% x̄: 1.71% x̃: 1.23%
    HURT stats (abs)   min: 1 max: 2 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.45% max: 1.65% x̄: 0.76% x̃: 0.46%
    95% mean confidence interval for instructions value: -11.56 -3.34
    95% mean confidence interval for instructions %-change: -1.91% -1.28%
    Instructions are helped.

    total loops in shared programs: 4831 -> 4831 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 372136618 -> 372145628 (<.01%)
    cycles in affected programs: 9218230 -> 9227240 (0.10%)
    helped: 131
    HURT: 86
    helped stats (abs) min: 1 max: 798 x̄: 39.79 x̃: 12
    helped stats (rel) min: <.01% max: 6.75% x̄: 0.42% x̃: 0.13%
    HURT stats (abs)   min: 2 max: 2442 x̄: 165.38 x̃: 6
    HURT stats (rel)   min: <.01% max: 20.83% x̄: 0.74% x̃: 0.12%
    95% mean confidence interval for cycles value: -2.07 85.11
    95% mean confidence interval for cycles %-change: -0.22% 0.30%
    Inconclusive result (value mean confidence interval includes 0).

    total spills in shared programs: 11956 -> 11950 (-0.05%)
    spills in affected programs: 77 -> 71 (-7.79%)
    helped: 3
    HURT: 0

    total fills in shared programs: 25619 -> 25549 (-0.27%)
    fills in affected programs: 593 -> 523 (-11.80%)
    helped: 4
    HURT: 0

    LOST:   0
    GAINED: 0

    Total CPU time (seconds): 1695.69 -> 1706.03 (0.61%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 15:12:17 -07:00
Caio Marcelo de Oliveira Filho 46de3beab1 gallium: Add PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
Tells whether or not the driver can handle gl_LocalInvocationIndex and
gl_GlobalInvocationID.  If not supported (the default), state tracker
will lower those on behalf of the driver.

v2: Add case to u_screen.c.  (Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 15:12:17 -07:00