Commit Graph

152359 Commits

Author SHA1 Message Date
Mike Blumenkrantz dea65ae590 zink: finish up radv piglit baseline updates
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15900>
2022-04-12 14:00:47 -04:00
Konstantin Seurer 521492e8b1 radv: Refactor ray tracing support checks
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15860>
2022-04-12 16:13:38 +00:00
Konstantin Seurer a9fce44dd6 radv: Refactor radv_tex_aniso_filter
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15860>
2022-04-12 16:13:38 +00:00
Mike Blumenkrantz 6b65d4234c radv: set read/write without format flags for supported texel buffers
if the storage case is supported, this should be supported too

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15826>
2022-04-12 15:52:03 +00:00
Samuel Pitoiset 2b688942c1 Revert "radv: Disable NGG for GS with suboptimal output vertex count."
It breaks too many things and shouldn't have been merged. The fix isn't
trivial and it will probably not be backported because it's intrusive.

It will be re-applied later when everything will work.

This reverts commit 94706601fa.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15882>
2022-04-12 12:26:32 +00:00
Gert Wollny e466d73368 r600: make r600_load_ar available to driver code
This is needed for the new NIR assembler

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
2022-04-12 12:10:19 +00:00
Gert Wollny 050e05db22 r600: Set the last bit if an alu group is split by kcache allocation
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
2022-04-12 12:10:19 +00:00
Gert Wollny d920200ad6 r600: Force last instruction of group when starting a new CF
When emitting the AR forces splitting an ALU group, and at the same time
a new CF instruction is started, then the last instrcution in the finished
CF block might not have the "last" bit set, which results in an invalid
shader that might hang, or crash SB.
So when a new CF is started, force the last bit in the last ALU instruction.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
2022-04-12 12:10:19 +00:00
Gert Wollny 04fd9a6488 r600: don't reschedule INTERP_LOAD_P0
With the NIR code, we have instructions groups that use
INTERP_LOAD_P0 that don't fill all slots. Just make sure
the backend scheduler doesn't fill in INTERP_LOAD_P0
instructions with a different LDS location.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
2022-04-12 12:10:19 +00:00
Gert Wollny 3c4644afb0 r600: ignore dest sel for non-write targets when counting registers
Since the value is not written, there is no need to allocate
a register for it, so don't take it into account.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
2022-04-12 12:10:19 +00:00
Gert Wollny 67d145d9ab r600: Don't limit scheduling of PARAM_SRC values
ALU_SRC_PARAM_BASE is an inline constant that defines the
address for pulling data from LDS memory for interpolation
and not a value from the kcache, so there is no need to
take these values into account when allocating kcache
load slots.

v2: Fix the constant range check to not exclude the translated
    ranges for kcache banks 2 and 3.
v3: limit range check to only include kcache values and and
    rename relevant function (Emma).

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
2022-04-12 12:10:19 +00:00
Rhys Perry f6262804af radv: increase inline push constant limit if we can inline all constants
fossil-db (Sienna Cichlid):
Totals from 665 (0.49% of 134627) affected shaders:
CodeSize: 4519620 -> 4491724 (-0.62%); split: -0.62%, +0.01%
Instrs: 842745 -> 837313 (-0.64%); split: -0.66%, +0.01%
Latency: 7289925 -> 7279661 (-0.14%); split: -0.30%, +0.16%
InvThroughput: 1240770 -> 1240639 (-0.01%); split: -0.01%, +0.00%
VClause: 15799 -> 15772 (-0.17%)
SClause: 33773 -> 32604 (-3.46%); split: -3.66%, +0.20%
Copies: 67695 -> 64992 (-3.99%); split: -4.49%, +0.50%
PreSGPRs: 38597 -> 38640 (+0.11%); split: -0.14%, +0.25%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
2022-04-12 11:44:30 +00:00
Rhys Perry 773c7cbcbc radv,aco: implement 64-bit inline push constants
fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 1932 -> 1560 (-19.25%)
Instrs: 357 -> 303 (-15.13%)
Latency: 6576 -> 5883 (-10.54%)
InvThroughput: 26304 -> 23532 (-10.54%)
SClause: 42 -> 24 (-42.86%)
Copies: 90 -> 105 (+16.67%); split: -10.00%, +26.67%
PreSGPRs: 144 -> 201 (+39.58%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
2022-04-12 11:44:30 +00:00
Rhys Perry 7f6262bb85 radv: allow holes in inline push constants
Use a dword mask instead of a range to track which push constants to
inline.

fossil-db (Sienna Cichlid):
Totals from 5724 (4.25% of 134621) affected shaders:
CodeSize: 20894044 -> 20815748 (-0.37%); split: -0.39%, +0.02%
Instrs: 4002568 -> 3988385 (-0.35%); split: -0.38%, +0.02%
Latency: 29285060 -> 29224414 (-0.21%); split: -0.22%, +0.01%
InvThroughput: 5529700 -> 5526893 (-0.05%); split: -0.05%, +0.00%
VClause: 78093 -> 78240 (+0.19%); split: -0.23%, +0.41%
SClause: 135495 -> 131027 (-3.30%); split: -3.30%, +0.00%
Copies: 330856 -> 324552 (-1.91%); split: -2.37%, +0.46%
PreSGPRs: 226031 -> 224778 (-0.55%); split: -0.61%, +0.05%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
2022-04-12 11:44:30 +00:00
Rhys Perry 72cf6cca91 radv: allow inline push constants in more situations
We don't need to disable this path if there are indirect or 8/16/64-bit
push constant loads. We can just use the default path for them.

fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 2028 -> 1884 (-7.10%)
Instrs: 366 -> 363 (-0.82%); split: -2.46%, +1.64%
Latency: 6630 -> 6579 (-0.77%)
InvThroughput: 26520 -> 26316 (-0.77%)
Copies: 84 -> 102 (+21.43%)
PreSGPRs: 141 -> 222 (+57.45%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
2022-04-12 11:44:30 +00:00
Mykhailo Skorokhodov 9c7e750ffe intel/fs: Enable b2f(inot(a)) and b2i(inot(a)) optimization for Gfx12+
The commit enables the optimization for Intel Gfx12+ graphics.

Tigerlake
```
total instructions in shared programs: 1289326 -> 1289015 (-0.02%)
instructions in affected programs: 37841 -> 37530 (-0.82%)
helped: 78
HURT: 9
helped stats (abs) min: 1 max: 26 x̄: 4.69 x̃: 3
helped stats (rel) min: 0.10% max: 12.50% x̄: 2.07% x̃: 1.21%
HURT stats (abs)   min: 1 max: 18 x̄: 6.11 x̃: 4
HURT stats (rel)   min: 0.16% max: 1.95% x̄: 0.94% x̃: 0.61%
95% mean confidence interval for instructions value: -4.95 -2.20
95% mean confidence interval for instructions %-change: -2.34% -1.18%
Instructions are helped.

total cycles in shared programs: 105606388 -> 105606442 (<.01%)
cycles in affected programs: 620119 -> 620173 (<.01%)
helped: 49
HURT: 28
helped stats (abs) min: 2 max: 3618 x̄: 228.63 x̃: 12
helped stats (rel) min: 0.02% max: 23.31% x̄: 4.60% x̃: 1.11%
HURT stats (abs)   min: 1 max: 2142 x̄: 402.04 x̃: 29
HURT stats (rel)   min: 0.01% max: 36.42% x̄: 5.01% x̃: 0.46%
95% mean confidence interval for cycles value: -151.80 153.20
95% mean confidence interval for cycles %-change: -3.00% 0.79%
Inconclusive result (value mean confidence interval includes 0).
```

Related-to: 7725d60938
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14017>
2022-04-12 10:55:05 +00:00
Gert Wollny d1c7a7b131 virgl: Add an extra mov for int outputs from constant and immediate inputs
virglrenderer doesn't properly emit the conversion code when the source
is a integer value and the output is also integer.

Fixes on NTT:
  dEQP-GLES31.functional.shaders.sample_variables.sample_mask.inverse_per_*

v2: fix typo (Emma)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
2022-04-12 10:44:17 +00:00
Gert Wollny a083ae818a virgl: Always make some extra temps available for transformations
The host driver will optimize unused variables away, and checking thoroughly whether we
may need an extra temp is just uselessly costly.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
2022-04-12 10:44:17 +00:00
Gert Wollny a4a34cd323 virgl: Propagate precice flag through moves
NIR doesn't propagate precise through moves, and with NTT the
last output is usually preceded by a move, so that we no longer
see that the evaluation of some value is supposed to be exact,
and, hence we can't decorate the outputs accordingly.

Fixes with NTT:
 dEQP-GLES31.functional.tessellation.common_edge.
     triangles_equal_spacing_precise
     triangles_fractional_odd_spacing_precise
     triangles_fractional_even_spacing_precise
     quads_equal_spacing_precise
     quads_fractional_odd_spacing_precise
     quads_fractional_even_spacing_precise

v2: Don't clear the precise flag when we hit a mov, because we may
    hit a if/else construct like below and we don't track branches

    IF X
       TEMP[0] = OP_PRECICE ...
    ELSE
       TEMP[0] = MOV CONST[]
    ENDIF

    Thanks Emma for pointing out the problem.

v2: allocate precise handling flags to transform_prolog (Emma)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
2022-04-12 10:44:17 +00:00
Juan A. Suarez Romero 0439f0e9fc ci: add Broadcom CI maintainer
Include in the CODEOWNERS file who to ping in case of issues with the
Broadcom (V3D/V3DV/VC4) CI.

v2:
 - Add Chema (Chema)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15858>
2022-04-12 10:42:31 +00:00
Juan A. Suarez Romero 18c4ad6e3b CODEOWNERS: add Broadcom maintainers
v2:
 - Add more maintainers (Iago)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15858>
2022-04-12 10:42:31 +00:00
Gert Wollny c63424b2eb r600: Only emit the NOP group triggered by dest.rel after a full group
In addition really fill all slots, because otherwise the alu-group merger
might move a read from the indirectly written register into the 't' slot.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15848>
2022-04-12 10:33:58 +00:00
Icecream95 fc6f141304 drm-shim: Implement a shim function for close
Remove the fd from the fd_map, so that if the fd is later reused for
another file then mmap won't be intercepted.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
2022-04-12 10:01:39 +00:00
Icecream95 c9eec12be7 drm-shim: Explicitly use off64_t for the offset to drm_shim_mmap
drm_shim.c undefines the _FILE_OFFSET_BITS macro, so plain off_t might
be 32 bits, while it's 64 bits in device.c. To avoid this mismatch,
use off64_t which will always be 64 bits in both source files.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
2022-04-12 10:01:39 +00:00
Icecream95 11ab86d581 drm-shim: Return fake render nodes in /dev/dri first
loader_open_render_node returns the first device in /dev/dri that it
can use. To make sure the drm-shim device always gets chosen, return
the fake entries in readdir before returning the real ones.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
2022-04-12 10:01:39 +00:00
Icecream95 dfd30035b9 drm-shim: Add a function for mmap64 rather than using an alias
Fixes build on 32-bit systems.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
2022-04-12 10:01:39 +00:00
Marcin Ślusarz 9b23aaf3cf nir: remove gl_PrimitiveID output from MS when it's not used in FS
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15340>
2022-04-12 09:35:26 +00:00
Marcin Ślusarz 65600a34c2 anv: initialize 3DMESH_1D.ExtendedParameter0 when ExtendedParameter0Present
When IndirectParameterEnable==true it's not actually used by the hardware,
but if it's not initialized and INTEL_DEBUG=bat is set, then Valgrind complains.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15850>
2022-04-12 09:10:31 +00:00
Marcin Ślusarz f844ce66c8 anv: fix push constant lowering for task/mesh
Fixes: a6031cd9bd ("anv: fix push constant lowering with bindless shaders")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15850>
2022-04-12 09:10:31 +00:00
Timothy Arceri 20ab7046c0 glsl/st: use nir pass to lower indirect rather than GLSL IR
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.

This patch fixes some tests on i915, d3d12, lima and vc4.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>
2022-04-12 06:51:20 +00:00
Samuel Pitoiset 619e6d44eb radv: add few helpers to deal with pipeline layout
With VK_EXT_graphics_pipeline_library, we will have to support
independent sets and also to merge sets from different libraries.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15849>
2022-04-12 06:31:33 +00:00
Samuel Pitoiset c338bd2957 radv: remove unused radv_pipeline_layout::size field
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15849>
2022-04-12 06:31:33 +00:00
Samuel Pitoiset dca28a6355 radv: drop the remaining uses of shader modules
With VK_EXT_graphics_pipeline_library, shader modules can be NULL and
be passed via the pNext of VkPipelineShaderStageCreateInfo. To prepare
for this, just store everything we need to radv_pipeline_stage.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
2022-04-12 06:13:24 +00:00
Samuel Pitoiset b48231cb90 radv: store the shader sha1 to radv_pipeline_stage
To remove use of shader modules completely in the next commit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
2022-04-12 06:13:24 +00:00
Samuel Pitoiset c1b9c1269d radv: replace convert_rt_stage() by vk_to_mesa_shader_stage()
Mesa shader stages are correctly sorted.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
2022-04-12 06:13:24 +00:00
Pavel Ondračka f1202a92cf nine: check hardware support before using vertex texture
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15864>
2022-04-12 07:38:47 +02:00
Mike Blumenkrantz d637eee212 zink: create pipeline layout if only bindless descriptor set is used
bindless descriptors are descriptors too.

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15853>
2022-04-12 04:49:17 +00:00
Mike Blumenkrantz 23c758807e zink: handle 0 ubos and 0 ssbos in pipeline layout
this is the number of types needed, and it can be zero

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15853>
2022-04-12 04:49:17 +00:00
Mike Blumenkrantz c7ae22e4b8 zink: prune unused st-injected pointsize exports
only the last vertex stage needs to keep these, so prune any that aren't
being weirdly passed through

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15853>
2022-04-12 04:49:17 +00:00
Mike Blumenkrantz cf3f3791e3 zink: try copy region first for non-resolve blits
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15853>
2022-04-12 04:49:17 +00:00
Mike Blumenkrantz 327ca3e5ef zink: refactor copy_region path in zink_blit to util function
no functional changes

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15853>
2022-04-12 04:49:17 +00:00
Dave Airlie 60c61d7b68 draw: handle tess eval shader when getting num outputs
This tripped up some pointsize/prim id interactions with zink.

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15872>
2022-04-12 14:21:41 +10:00
Emma Anholt 835704e669 turnip: Move autotune buffers to suballoc.
Now the ANGLE trex_200 trace replay does a single BO allocation at startup
for autotune results instead of one per frame (~350 for the whole replay).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
2022-04-12 01:01:56 +00:00
Emma Anholt 7c636acd53 turnip: Get autotune off of ralloc destructors.
We've wanted to remove destructors from ralloc's API for a long time (it's
an extra storage cost per ralloc for a rarely-used feature), and for the
suballoc change we'd need to spend more storage on storing the tu_device
pointer per result since destructors don't get anything else but the
pointer passed into them.

Fixes use-after-frees:

=================================================================
==2383==ERROR: AddressSanitizer: heap-use-after-free on address 0xffff88fe1940 at pc 0xffff934f427c bp 0xfffff5481e90 sp 0xfffff5481ea8
WRITE of size 8 at 0xffff88fe1940 thread T0
    #0 0xffff934f4278 in list_del ../src/util/list.h:108
    #1 0xffff934f4278 in result_destructor ../src/freedreno/vulkan/tu_autotune.c:237
    #2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
    #3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
    #4 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
    #5 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
    #6 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
    #7 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
[...]

0xffff88fe1940 is located 80 bytes inside of 112-byte region [0xffff88fe18f0,0xffff88fe1960)
freed by thread T0 here:
    #0 0xffff9c1c90d8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
    #1 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
    #2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
    #3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
    #4 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
    #5 0xffff935cf2ac in tu_queue_submit_locked ../src/freedreno/vulkan/tu_drm.c:997
[...]

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
2022-04-12 01:01:56 +00:00
Emma Anholt 435d4f08b2 turnip: Reduce the pipeline's CS allocation a bit.
We don't return unused space to the suballocator, so it's a little useful
to limit how much we overallocate to reduce memory footprint.  I took a
look through the tu_cs_emit_array() calls and accounted for a couple of
them in the variant-specific space calculation, then dropped the base
allocation by factors of 2 until we started throwing asserts.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
2022-04-12 01:01:56 +00:00
Emma Anholt 58f6331eec turnip: Skip telling the kernel the BO list when we don't need any.
In fencing, we sometimes do a dummy submit with no nr_cmds.  If we don't
have commands to execute, we don't need to pin or fence any BOs either.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
2022-04-12 01:01:56 +00:00
Emma Anholt dc3203b087 turnip: Sub-allocate pipelines out of a device-global BO pool.
Allocating a BO for each pipeline meant that for apps with many pipelines
(such as Asphalt9 under ANGLE), we would end up spending too much time in
the kernel tracking the BO references.

Looking at CS:Source on zink, before we had 85 BOs for the pipelines for a
total of 1036 kb, and now we have 7 BOs for a total of 896 kb.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
2022-04-12 01:01:56 +00:00
Emma Anholt e0fbdd3eda turnip: Stop allocating unused pvtmem space in the pipeline CS.
The pvtmem was split off to a separate read/write BO.

Fixes: 931ad19a18 ("turnip: make cmdstream bo's read-only to GPU")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
2022-04-12 01:01:56 +00:00
Emma Anholt 80c44a6626 turnip: Track refcounts on BOs in kgsl as well.
I'm going to be using the BO refcount for the pipeline and autotune buffer
suballocation.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
2022-04-12 01:01:56 +00:00
Francisco Jerez e858da39e5 intel/perf: Fix OA report accumulation on Gfx12+.
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values.  This was causing the values
of many performance counters to be corrupted.

The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define.  However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout.  Use the size derived from it instead, and remove
the stale define.

Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
2022-04-12 00:11:47 +00:00