In the second (scalar pass) use the information about # of registers
used in the first pass as the target max, and round-robin within that
range. This generally gives the post-RA sched pass more opportunities
to re-order instructions to remove nop's.
Also, we can be a bit clever when assigning dest registers for SFU
instructions, by picking the register used for it's src (if available
and already assigned). This avoids some (ss) syncs caused by write
after read hazards. (Ie. the SFU instruction will read it's own src
before writing dest.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
Doesn't take into account stalls that result from a register written in
a different block, etc. But this should be more useful than just using
number of (ss)'s by trying to estimate how costly a given sync is.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
They were inserting a nop between back to back SFU instrucions. But
that doesn't actually appear to be required. And they get stripped out
later anyways before legalize.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
The old code looks the same as GL driver, but we get things like
pipe_count = {32, 1}, which seems bad.
This uses similar logic as for tiles which produces a balanced pipe_count
width/height.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3142>
Previously, we only added the secondary command buffer's draw and
draw epilogue command streams to the primary command buffer on
vkCmdExecuteCommands. However, we also need to merge the primary cs
for non-draw operations like vkCmdCopyBuffer and vkCmdBeginQuery.
Fixes dEQP-VK.memory.pipeline_barrier.host_write_transfer_src.*
and various other tests in dEQP-VK.api.command_buffers.*.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3988>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3988>
Need to list_delinit() before we clone the instruction to split it into
individual samgpN instructions, otherwise we get list corruption.
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
1) emperically, 10 seems like a more accurate # than 4
2) push "soft" delay handling into ir3_delayslots(), as
we should also be using it to calculate the costs
that the schedulers use
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
In schedule live value tracking, differentiate between half vs full
precision. Half-precision live values are less costly than full
precision.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
Current scheduler thresholds try to ensure there are warps available to
switch to when hiding texture fetch latency. But if there is none to
hide, we should allow scheduler to use more registers to reduce nops.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
To avoid spurious sync flags, we want to, for a6xx+, operate in terms of
half-regs, with a full precision register testing the corresponding two
half-regs that it conflicts with.
And while we are at it, stop open-coding BITSET
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
Newer a6xx no longer has programmable GMEM base, so we can't rely on the
kernel driver setting it to 0x100000 (GMEM base is 0 on such GPUs).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3979>
Register packing macros makes this only set the first bit. Set to whole
dword to fix srgb for color attachments >0.
Fixes: 59f29fc8 ("turnip: Convert the rest of tu_cmd_buffer.c over to the new pack macros.")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3979>
I had some trouble because I assumed this was right, tested that the
alignment requirement is actually 16.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3979>
These formats are an exception that can't be modeled in the current format
table. Switch to a table with only a single a6xx_format per vk format,
and deal with the exceptions separately (currently the only exception is
10_10_10_2_UNORM which has a different color format).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3806>
There's still a TODO related to key->sample_shading, but it doesn't look
like it changes anything in ir3, so it works without that.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3923>
Use robclark's new crashdec/devcoredump thing instead.
Note: not sure this ever really worked because it didn't WFI.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3925>
This way we can also use ir3_print from computerator, which mostly
bypasses the ir3_block construct (since it doesn't need to do
scheduling, etc)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3926>
Updates for differences between fdre-a3xx's early version of ir3, and
what we have now in mesa. And updates for instruction name and syntax
changes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3926>
Now that we have access to the interior switch statement not going through
the txs special case for coord_components, we can just use it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
This lowers mediump FS outputs to fp16 in the ir3 backend. For now
this is a modest improvement, which mostly helps us whittle down the
full mediump work. Once the GLSL level support lands, then right hand
side of the store output intrinsics will be fp16 expressions and we'll
cancel out the fp16 -> fp32 -> fp 16 round trip here.
We've had different attempts at implementing this: rewriting stores in
the GLSL IR, lowering GLSL IR outputs to temporaries and inserting
conversions when writing the temporaries to the outputs. In the end,
GLSL ends up getting in the way a lot and doing it at the nir level is
easier and still possible since we have the output var precisions.
This part of the fp16 work is more of a step on the way towards full
fp16 support and will add a few extra conversion instructions:
total instructions in shared programs: 8151 -> 8163 (0.15%)
instructions in affected programs: 1187 -> 1199 (1.01%)
helped: 4
HURT: 10
total nops in shared programs: 3146 -> 3152 (0.19%)
nops in affected programs: 563 -> 569 (1.07%)
helped: 5
HURT: 10
total non-nops in shared programs: 5005 -> 5011 (0.12%)
non-nops in affected programs: 92 -> 98 (6.52%)
helped: 0
HURT: 3
total dwords in shared programs: 12832 -> 12800 (-0.25%)
dwords in affected programs: 96 -> 64 (-33.33%)
helped: 1
HURT: 0
total last-baryf in shared programs: 118 -> 115 (-2.54%)
last-baryf in affected programs: 21 -> 18 (-14.29%)
helped: 1
HURT: 0
total full in shared programs: 424 -> 417 (-1.65%)
full in affected programs: 15 -> 8 (-46.67%)
helped: 7
HURT: 0
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
So far we only handle full regs of arrays during pre-allocation.
This patch is to handle half regs of arrays and also consider the size
of half regs when finding out conflicts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
This pass tries to fold f2f16 conversion into alu instructions.
This will be useful to help reduce the number of instructions once
mesa starts supporting precision lowering. For example:
add.f r0.w, r0.w, c0.x
cov.f32f16 hr2.x, r0.w
to
add.f hr2.x, r0.w, c0.x
Additionally this pass also tries to fold f2f16 conversion into load_input
instruction:
bary.f r0.x, 3, r0.w
cov.f32f16 hr0.x, r0.x
to
bary.f hr1.x, 3, r0.x
v2: Edit to not fold OPC_MAX_F and OPC_MIN_F, since that's not valid.
v3: Add OPC_ABSNEG_F to the blacklist as well.
v4: Don't remove dead cov instructions, DCE will do that later; don't
iterate through sources when a cov only has one; remove special
handling of IR3_REG_ARRAY and IR3_REG_RELATIV.
v5: Handle folding into u32.u32 movs of floats correctly, don't bail
out on IR3_REG_RELATIV or IR3_REG_ARRAY movs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
Nothing used by mesa, but crashdec tool uses a few of these. And since
the practice is these days to sync mesa->envytools, adding these on the
mesa side first.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3833>
This is a builtin type (treated as uint, but with special type-aware
decoding) in envytools/cffdump. Lets teach gen_header.py about it and
drop the enum hack in the xml so I don't have to keep deleting the enum
when I sync the xml back to the freedreno envytools tree.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3833>
Similar to vkCmdClearAttachments(), we use CP_COND_REG_EXEC to
conditionally execute both the gmem and sysmem paths, except for after
the last subpass where it's known whether we're using sysmem rendering
or not.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3713>
This has only lightly been tested. It passes dEQP-VK.api.smoke.triangle,
so at least we're able to show a triangle. For now, it's just enabled
under a debug flag. In the future we'll probably want some heuristics
like what freedreno has and another debug flag to disable it except when
it's forced.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3713>
We may need shader workarounds for some formats, but for now this seems
to work at least as well as the gmem path for clearing multisample
attachments. And soon we'll start calling this even on the gmem path,
since we leave the final decision of whether to use sysmem or not up
till the end, so we can't have it assert or otherwise working tests
would assert.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3713>
I merely ported a freedreno patch to turnip which
updates some magic regsiter values.
commit ff6e148a3d
Author: Rob Clark <robdclark@chromium.org>
CommitDate: Tue Oct 29 09:19:34 2019 -0700
Subject: freedreno/a6xx: add a618 support
That's all that Rob did for gallium for a618, so I assume that's we need
for turnip also.
Tested manually with:
dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.*
pass 300/555
fail 0/555
skip 255/555
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3743>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3743>
The value of some magic regsiters differ across chipsets. fd6_context
manages the differences by initializing them at runtime. Let's do the
same.
Add to tu_physical_device a subset of those found in fd6_context:
RB_UNKNOWN_8E04_blit
RB_CCU_CNTL_gmem
PC_UNKNOWN_9805
SP_UNKNOWN_A0F8
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3743>
Loses some information about which formats can be used in which cases, but
we encode that information in the format table anyway.
Important notes:
* RB6_R10G10B10A2_UNORM becomes FMT6_R10G10B10A2_UNORM_DEST
* TFMT6_8_8_8_UNORM becomes FMT6_8_8_8_X8_UNORM (not FMT6_8_8_8_UNORM)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3798>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3798>
The previous commit leads to match immed values unexpectedly.
This makes constlen for each shader including bvert wrong.
Also fixes atan2 for mediump deqp tests.
Fixes: cbd1f47433 ("freedreno/ir3: convert back to 32-bit values for half constant registers.")
v2: Move conversion up above fabs/fneg modifier handling as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
This lets is_same_type_reg() recognize that the dst and src of the
immediate MOV are the same and unblocks fp16 constant propagation.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
The freedreno gen_header.py script now only works under python3.
It contains a "print()" call which prints a blank line under python3
but prints "()" under python2.7.
However the Android build currently uses python2.
This leads to incorrect code generation and a later build error.
.../STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/adreno_common.xml.h:163:2: error: expected identifier or '('
()
Fix this by adding MESA_PYTHON3 and using it for the freedreno scripts.
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3736>
This means you can directly use format utils on it without having to have
your own GL enum to number-of-components switch statement (or whatever) in
your vulkan backend.
Thanks to imirkin for fixing up the nouveau driver (and a couple of core
details).
This fixes the computed qualifiers for EXT_shader_image_load_store's
non-integer sizeNxM qualifiers, which we don't have tests for.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3d)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
In addition to preparing us for dynamically resizing them, which has to
be controlled by the device, this greatly reduces the memory usage when
allocating large numbers of command buffers, making
dEQP-VK.api.object_management.max_concurrent.command_buffer_primary go
from crash -> pass.
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3621>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3621>
Document the first DWORD, which at least for the Vulkan blob on a640
isn't always 2.
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3600>
In some cases we need to split components out from what was already a
collect. That was making it hard to DCE unused components of the
collect. (Ie. unused components of fragcoord, etc)
So just detect this case and skip the chained collect+split.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
This was somehow working to create the instructions in a random block,
and use the value in other blocks, by dumb luck. But two-pass-RA's
better choice of register assignment causes a couple dEQPs to start
failing without this fix:
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_2
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Some of the aspects of tex prefetch are in common with normal tex
instructions, such as having a wrmask to control which components
are written. Add a helper for this.
This should result in actually using the prefetch wrmask to avoid
fetching unneeded components.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
ra_block_compute_live_ranges() treats zero as "not yet defined", so
probably best to not let this be a valid instruction #
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
After RA, we can schedule to increase parallelism (reduce nop's) without
worrying about increasing register pressure. This pass lets us cut down
the instruction count ~10%, and prioritize bary.f, kill, etc, which
would tend to increase register pressure if we tried to do that before
RA.
It should be more useful if RA round-robin'd register choices.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
kill (and other cat0/flow instructions) do not have a dst register.
Which was mostly harmless before, other than RA thinking it would need
a free register to write. (But nothing consumed it, so the value would
be immediately dead.) But this would cause more problems with postsched
which would see a bogus dependency.
Also, post-RA sched *does* need to see the dependency on the predicate
register.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Originally these were nested functions, which worked nicely, giving us
the function of a local macro that was actual 'c' syntax (ie. not token
pasted macro). But these were converted to macros because clang doesn't
let us have nice gcc extensions.
Extract these back out into functions, before adding more things and
making the macros even more cumbersome.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
A post-RA sched pass will move the extra mov's to the wrong place, so
rework the fixup so it can run after RA (and therefore after postsched)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
We want to do this only once. If we have post-RA sched pass, then we
don't want to do it pre-RA. Since legalize is where we resolve the
branch/jumps, we might as well move this into legalize.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
This way we can deal with it in one place, *after* all the blocks have
been scheduled. Which will simplify life for a post-RA sched pass.
This has the benefit of already taking into account nop's that legalize
has to insert for non-delay related reasons.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
This scenario can come up with block-sched and nop-sched moved to after
RA. So lets fix it first to keep things bisectable.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Previously, calling vkCmdCopyQueryPoolResults with the
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT flag set the query result
field in the buffer to 0 if unavailable and the query result if
available. This was a misunderstanding of the Vulkan spec, and this
commit corrects the behavior to emitting a separate available
result in addition to the query result.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
Previously, calling vkGetQueryPoolResults with the
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT flag set the query result
field in *pData to 0 if unavailable and the query result if
available. This was a misunderstanding of the Vulkan spec, and this
commit corrects the behavior to eriting a separate available result
in addition to the query result.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
We need the LINEAR versions for AMD_shader_explicit_vertex_parameter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
The shader object is destroyed even if its creation failed. It is also
not destroyed if its compilation or upload fails, leading to leaks.
Finally, tu_compute_pipeline_create() should set output var
pPipeline to VK_NULL_HANDLE if it fails.
Avoids crash on
dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3572>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3572>
When an error condition occurs during tu_create_cmd_buffer(), the
cmd buffer has already been added to a pool, so the cleanup code should
remove it.
Fixes a crash (assert in tu_device::tu_bo_finish()) in dEQP tests:
dEQP-VK.api.object_management.max_concurrent.command_buffer_primary
dEQP-VK.api.object_management.max_concurrent.command_buffer_secondary
due to pool attempting to destroy an invalid command buffer.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3572>
A cmdstream of size zero is invalid. But this can appear in various
places where we emit a pointer to state. This doesn't show up with
newer kernels (newer than v5.0) which use "softpin", but on earlier
kernels can result in:
[drm:msm_ioctl_gem_submit [msm]] *ERROR* invalid cmdstream size: 0
Since the pointer value doesn't matter in these cases, the easy solution
is just to not emit a cmds table entry in this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2805>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2805>
Unlike on an immidiate-mode renderer, Turnip only renders tiles on
vkCmdEndRenderPass. As such, we need to track all queries that were
active in a given render pass and defer setting the available bit
on those queries until after all tiles have rendered.
This commit adds a draw_epilogue_cs to tu_cmd_buffer that is
executed as an IB at the end of tu_CmdEndRenderPass. We then emit
packets to this command stream that update the availability bit of a
given query in tu_CmdEndQuery.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
Mostly a translation of freedreno's implementation of glEndQuery for
GL_SAMPLES_PASSED query objects with a slight modification to set the
availability bit of the query bo (slot->available) if the query was
not ended inside a render pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
General structure is inspired by anv's implementation in genX_query.c.
We define a packed struct that tracks sample count at the beginning of
the query and at the end; the result of the occlusion query is then
slot->end - slot->begin.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
This gets a lot of the hard code converted over to the new macros,
resulting in (I feel) much more readable code with
LESS_SHOUTING_ABOUT_THE_REG(). I decided to consistently put the reg on
its own line, so that all the register names line up.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
This introduces some minor unpacking of the temporary fd_reg_pair structs
to code that previously was packing a whole register field.
In the pack wrapper in tu_cs.h, I added some explanatory docs, dropped the
relocs handling since we don't need it, and removed the extra regs[] in
the __ONE_REG() macro (which was causing gcc's optimizer to fall on its
face in my release build).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
Sometimes you want to zero out an address by supplying a NULL BO, but
without this we would end up only emitting one dword. Increases size of
fd6_gmem.o by .8%, though it's not clear to me why (no obvious terrible
codegen happening)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
We want to finish off cmd emission in the primary CS and add its entry to
the IB, but regardless of whether there had been anything in the primary
CS to emit, we still need a reserved CS entry for the loop below.
Fixes crashes in dEQP-VK.binding_model.shader_access.secondary_cmd_buf.*
and many more in dEQP-VK.renderpass*
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3524>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3524>
legalize is computing a lot of state that goes in the variant, let's just
store it directly instead of passing pointers around. This leaves
max_bary in place, which is doing some surprising work (overwriting the
original total_in in some cases).
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3494>
Pretty straightforward: Port texture descriptor code from freedreno, fill
in alignment limits from closed vk, and tu_cmd_buffer.c was already
uploading the texture descriptor.
This doesn't implement storage texel buffers (required in the compute
pipeline) yet, since those will need an IBO descriptor for the store path.
Still, making the load path be connected to the texture descriptor won't
hurt.
Part of #2237
Fixes dEQP-VK.binding_model.shader_access.primary_cmd_buf.uniform_texel_buffer.*
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3522>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3522>
turnip can have multiple inputs with the same location, and different
location_frac.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3109>
The closed GL driver doesn't use UBWC on any storage images. It does tile
mostly (skipping tiling on writeonly images, it seems), but for freedreno
we've been enabling tiling in all cases and it's fine. We do need to
disable UBWC, as tests fail otherwise and just plugging in the equivalent
UBWC regs like we were setting up a texture isn't enough.
Fixes dEQP-VK.image.atomic_operations.*
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
So far this doesn't handle the texture state-based storage image access
loads, and doesn't support descriptor arrays (same as SSBOs). The texture
side is more tricky, since we have another remapping table to work around.
This is enough to get some of dEQP-VK.image.atomic_operations.* working.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
Just make it be all SSBOs then all storage images. The remapping table
was there to make it so that the big gap present from gallium's atomic
lowering would get cleaned up, but that's no longer case. The table has
made it very hard to support Vulkan storage images, so it's time for it to
go.
This does mean that an SSBO/IBO that is only loaded (or size-queried) will
now occupy a slot in the table where it wouldn't before. This seems like
a minor cost compared to being able to drop this much logic.
With the remapping table gone, SSBO array handling for turnip just falls
out.
Fixes many array cases of
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_buffer.*
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca> (turnip)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
When VK_DESCRIPTOR_TYPE_SAMPLER is provided, it doesn't need to be
counted as a buffer count. Otherwise it leads to mismatch of allocated
buffer size, hitting VK_ERROR_OUT_OF_POOL_MEMORY finally.
Fixes: c39afe68f0
Also fixes amber tests:
./tests/cases/address_modes_float.amber
./tests/cases/address_modes_int.amber
./tests/cases/magfilter_linear.amber
./tests/cases/magfilter_nearest.amber
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
The anv implementation still isn't quite complete, but we can at least
start using the structs from the real extension.
v2: Fix circular pNext list (Lionel)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>
It doesn't really support any Vulkan properly yet so why not claim 1.2?
This was an easier way of fixing the build than trying to roll it
forward to a later version of ANV's entrypoint generator scripts.
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Setting up transitive conflicts between a full register and its two
half registers (eg r0.x and hr0.x and hr0.y) will make the half
registers conflict. They don't actually conflict and this prevents us
from using both at the same time.
Add and use a new ra helper that sets up transitive conflicts between
a register and its subregisters, except it carefully avoids the
subregister conflict.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Only occurrence of implicitly converting pointer->int.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2570>
These days `ctx->inputs` is the split scalar input components and
`ir->inputs` is the full vecN. This got fixed in the load_input case,
but the load_interpolated_input case was missed.
Fixes: bdf6b7018c ("freedreno/ir3: re-work shader inputs/outputs")
Signed-off-by: Rob Clark <robdclark@chromium.org>
This makes it easier to implement secondary command buffers, since we no
longer need to know the render area to set the gmem offsets for input
attachments and CmdClearAttachments.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3075>
The first pre_assign_inputs loop doesn't pre-assign sysvals, so skip the
second part for sysvals.
The sysvals don't need to be pre-assigned since the state for those isn't
shared between binning / nonbinning shaders.
Fixes assert failures in cases where the sysvals didn't end up in the same
registers for binning / nonbinning.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3168>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3168>
Fixes artifacts in the subpasses demo, which has a shader using fragcoord
without any varyings. It looks like setting this bit when there are no
varyings can cause weirdness in some cases (without this change, if the
previous shader had <= 8 varyings it would work, but with 9 varyings it
would have artifacts).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3143>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3143>
Fixes artifacts in the subpasses demo.
Workaround texture cache with input attachments from GMEM by adding a cache
invalidate between subpasses.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3143>
Similar to the existing usage for CP_COND_WRITE5, this makes it clear
what each of the magic parameters are for.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3116>
And add fields uncovered by looking at the firmware. I think this covers
all the memory, register, and scratch manipulation opcodes that exist on
A6xx, plus one additional nice find for Vulkan and describing a
previously unknown opcode and documenting CP_WAIT_REG_MEM.
Note that the bits for the CP_REG_TO_MEM count, as well as the formula
for computing the actual count for both CP_REG_TO_MEM and CP_MEM_TO_REG,
are changed because the A630 SQE firmware actually does something
different. I haven't investigated older microcodes to see whether this
extends back to A5xx and A4xx, but the only non-A6xx uses of this
field result in the same bit-pattern when using the A6xx bit range and
formula, so it should be safe to change the definition universally.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3116>