RA currently can't handle a live value that's part of a vector and
introduces extra copies. This was espeically a problem for bary.f, where
the bary coords were being split and repeatedly re-collected. But this
could be a problem in other situations as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
If an instruction's destination is unused, then we shouldn't penalize
it. For example, this helps us schedule atomic operations whose results
aren't read. This works around RA failures when CSE is enabled in some
robustness2 tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
In a scenario where there are a lot of texture fetches with constant
coordinates, this prevents the scheduler from scheduling all the setup
instructions after the first group of textures has been scheduled
because they are the only non-syncing thing and scheduling them didn't
decrease tex_delay. Collects with immed/const sources will turn into
moves of those sources, so we should treat them the same.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
This will be run right after nir->ir3. Even though we have SSA coming
out of NIR, we still need it for NIR registers, even though we keep the
original array around to insert false dependencies.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
The old delay calculation relied on the SSA information staying around,
and wouldn't work once we start introducing phi nodes and making
"normal" values defined in multiple blocks not array regs anymore.
What's worse is that properly inserting phi nodes when splitting live
ranges would make that code even more complicated, and this was the last
place post-RA that actually needed that information.
The new version only compares the physical registers of sources and
destinations. It works by going backwards up to a maximum number of
cycles, so it might be slightly slower when the definition is closer but
should be faster when it is farther away.
To avoid complicating the new method, the old method is kept around, but
only for pre-RA scheduling and it can therefore be drastically
simplified as the array case can be dropped.
ir3_delay_calc() is split into a few variants to avoid an explosion of
boolean arguments in users, especially now that merged_regs now has to
be passed to it.
The new method is a little more complicated when it comes to handling
(rptN), because both the assigner and consumer may be (rptN). This adds
some unit tests for those cases, in addition to dropping the to-SSA code
in the test harness since it's no longer needed.
Finally, ir3_legalize has to be switched to using physical registers for
the branch condition. This was the one place where IR3_REG_SSA remained
after RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
In particular, make sure they have a physreg assigned. This was the last
place after RA where SSA registers were created, which won't work with
the new post-RA delay calculation that relies on the physreg.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
There were two different approaches I saw in the post-RA code for
figuring out what regiser range a relative access touched:
1. Use reg->array.offset and reg->array.size. This is wrong in case
reg->array.offset was non-zero before RA, because array.size is
the size of the whole array and array.offset has the const offset
within the array baked in.
2. Lookup the array from the array ID and use the base + range there.
This is correct, but won't work with the new RA, where an array might
not always be assigned to the same register.
This replaces both methods with a new ir3_register::array.base field,
and switches all the users I could find to it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
To simplify the pre-RA merge set code and express the result live-range
splitting in RA, we need to add support for parallel copy instructions,
and for the merge set code these parallel copies need to be in SSA form.
Parallel copies have multiple destinations by necessity, but there was
no way to express this in the existing IR. In particular there was no
support for marking a register as being a destination, and no support
for indicating which destination register out of several an SSA source
refers to. This replaces ir3_register::instr with ir3_register::def and
re-purposes ir3_register::instr. I haven't propagated this into common
helpers, like ssa(), because that would vastly increase the amount of
churn and the number of places that produce such instructions should be
limited -- only RA will create parallel copies and they will be
destroyed right after RA. In the future swz will have multiple
destinations too, but it will only be created after RA via parallel copy
lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
If the tex/sfu ssa src is from a different block than the one currently
being scheduled, we do not have a valid sched-node. So fallback to
previous behavior rather than dereference an invalid ptr.
Fixes: 7821e5a3f8 ("ir3/sched: Don't penalize uses of already-waited tex/SFU")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10306>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
In order to simplify main DSI host database, split away phy register
definitions used on DSI v2 hosts to the separate database file.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11075>
nir_assign_io_var_locations() does not use outputs_written when
assigning driver locations. Use driver_location to avoid incorrectly
guessing what locations it assigned.
Copied from lavapipe 8731a1beb7
Will fix provoking vertex tf tests when VK_EXT_provoking_vertex
would be enabled:
dEQP-VK.rasterization.provoking_vertex.transform_feedback.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11111>
Otherwise it will store a pointer to already unmapped memory which
could lead to a crash in tu_CmdPushDescriptorSetWithTemplateKHR since
it tries to copy data from the old memory.
Fixes a crash with Zink's new lazy descriptor manager instroduced
in bfdd1d8d
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11137>
Using stripes to deal with the different packet layout variants resulted
in redefining "register" offsets with different values, so use "prefix"
to add a suffix to disambiguate.
drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h:1066: warning: "REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT" redefined
1066 | #define REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT 0x00000006
|
drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h:1057: note: this is the location of the previous definition
1057 | #define REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT 0x00000003
|
(Admittedly it isn't really a "prefix" but that was the field in the
schema available to use, and REG_INDEXED_CP_DRAW_INDIRECT_MULTI_STRIDE
sounds somewhat more funny.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
This runs through the SQE bootstrap code to extract the packet-table,
rather than relying on heuristics. As a bonus, it can detect the start
of the LPAC fw in a660+ fw so that we can properly decode the LPAC fw
and packet-table.
Note that this decodes the jmptable as normal instructions, which is a
change in behavior from the previous heuristic based jmptbl extraction.
Not sure if that is a good or bad thing.
For a5xx, for now the legacy heuristic based jmptable decoding is
preserved, at least until enough control regs are figured out.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
When we start running the bootstrap code thru the emulator we will need
the packet-table loading to actually happen. So add this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Run until the packet-table is populated, so the disassembler can use
this to know the offsets of various pm4 packet handlers without having
to rely on heuristics.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Some of the a6xx gens will require some control reg initialization, and
go into an infinite loop if they don't see the values they expect, so
we'll need to extract the compute gpu-id.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
This is an (at least somewhat complete) logical emulator of the a6xx SQE
that lets us step through firmware execution (bootstrap, cmdstream pkt
handling, etc). It lets us poke at various fw visible state and run
through pm4 packet(s) to better understand what the fw is doing when it
handles various packets.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Allow for different mnemonics depending on whether they are used as
source or destination register, to better reflect what they do.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
With disasm emulator mode, we'll start wanting some things that are
duplicationg what the assembler does, so just split out all the rnndb
bits into shared utils.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Going beyond 0x100000 results in hangs, however I found that the
last 0x100000 packet just doesn't get executed. Thus the real limit is
0x0FFFFF. At least this is true for a6xx.
This could be tested by appending nops to the cmdstream and placing
e.g. CP_INTERRUPT at the end, at any position other than being
0x100000 packet it results in a hang.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10786>
There is a limit on IB size, which on freedreno is set to 0x100000.
Going beyond it results in hangs, however I found that the last
0x100000 packet just doesn't get executed. Thus the real limit is
0x0FFFFF.
This could be tested by appending nops to the cmdstream and placing
e.g. CP_INTERRUPT at the end, at any position other than being
0x100000 packet it results in a hang.
Fixes:
dEQP-VK.api.command_buffers.record_many_draws_secondary_2
dEQP-VK.api.command_buffers.record_many_draws_primary_2
However these tests could trigger hangcheck timeouts.
Also this fixes hangs when opening captures of games in RenderDoc.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10786>
The CTS expects that some paths transfer SNORM data exactly, but the HW
will clamp 0x80 values to 0x81 in the process. We can treat snorm as
unorm, though, and get working compression without the clamping happening.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10735>
Now that we need FALLTHROUGH macros we weren't saving much by falling
through, and things were weirdly ordered anyway with depth intermixed with
color formats and the default case tucked in the middle of the switch
statement. Replace with pretty obvious ordering of normal color, planar
color, depth, then default.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10735>
Sometimes the reticle is missing. Skip it for now to keep the number of
pipeline failures due to flakes low.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11033>
This reverts commit 8e470457de.
Drop that jobs, as it's sometimes causing a gitlab-ci.yml parse error
that isn't readily reproducible:
Found errors in your .gitlab-ci.yml:
'a630-profile-traces' job needs 'arm_test' job but it was not added to the pipeline
'a630-profile-traces' job needs 'meson-arm64' job but it was not added to the pipeline
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11030>
The quick_gl set is too unstable -- even when I switched to a consistent
set of tests, and added lots of flakes, I keep getting new ones as some
test (unclear which, but it's like 7-8 minutes into the run) kills other
innocent ones. Until we get per process pagetables, disable this testing
by default.
However, now that we've freed up a board that was doing quick_gl, we have
time to do all of quick_shader so that piglit uprevs don't reshuffle the
test list and expose new failures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11018>
Started happening after disabling cpufreq, devfreq and runtime PM.
At least one of these fail in each run, so it's blocking MRs.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7987>
Use Piglit's replay profile to measure and store the time that frames
take to render in the GPU.
This job won't run automatically in regular pipelines, but will be
triggered automatically by a script for every successful pre-merge
pipeline.
This is because we want to generate performance data for every relevant
commit merged in main, but we don't want to keep a device busy during
the pre-merge run.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-By: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7987>
I forgot that after rebasing on large_consts support that this is now
called after the first time nir_lower_wrmask is called and can generate
partial writemasks that need to be lowered. While we're here, also call
the main optimization loop if things are lowered to scratch because it
generates address arithmetic that may need to be cleaned up.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10922>
The deqp test for it expects that the unused array elements are untouched,
so make sure they don't get replaced with random stack data.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10737>
We're supposed to map a floating-point value too large to be represented
as fp16 to infinity, however round-to-zero naturally rounds it down to
the largest representable fp16 number instead. The blob emits a bunch of
fixup code to work around this, but instead we can just do what all the
other drivers seem to do and use round-to-nearest-even instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10897>
The old pass had a few bugs:
- It tried to avoid folding f2f32 into f2f16, but didn't consider
conversions that were already folded in.
- It didn't prevent folding an f2f16 or f2f32 into a non-floating-point
op.
In addition it wasn't written in a manner which made handling integer
conversions practical. This rewrites the pass to instead calculate the
"type" of the conversion source and then check whether folding the
conversion is allowed. This allows us to cleanly separate the
declarative part where we describe how the HW works from the policy part
where we decide whether the transform is allowed, and makes it simple to
add support for folding integer conversions.
Closes: #3208
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10859>
Would allow earlier faults instead of having corrupted cmdstream.
This was already done to Freedreno long ago in:
04aff7e4 "freedreno: make cmdstream bo's read-only to GPU"
Since private memory should be GPU writable it is now allocated
separately, instead of suballocation from now read-only cmdstream.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10807>
GPU won't be able to write to such BOs, which would to useful for
cmdstream BOs.
Move "bool dump" to the new flags along the way.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10807>
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>
In-kernel DSI PHY driver is being more and more split from the main DSI
driver. Split PHY registers from main dsi.xml file to ease further split
of the drivers.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10817>
Not every MR, but several per day since I landed the apitrace switch which
increased trace replay resolution to 1080p. Hopefully some day we can
tune the hangcheck to be less aggressive.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10793>
Forcing round-to-nearest-even results in loss of opportunities for
conversion folding, causing a regression in gfxbench gl_alu2.
Fixes: de195671bd ("ir3: nir_op_f2f16 should round to even")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10773>
Small refactor to classify semphore types, currently only binary
syncobj is being used though.
v1. Fix a crash of dEQP-VK.api.null_handle.destroy_semaphore
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10126>
This brings in upstream mediump fixes, and should also replay faster than
.rdc files.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10295>
When copying layered images we ignored .layerCount parameter.
Fixes mis-rendering of walls in D3D11 game "Company Of Heroes 2".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10736>
The spec requires at least 2, but says "No specific guarantees are made
about higher priority queues receiving more processing time or better
quality of service than lower priority queues." So, we can just leave the
priorities as a stub.
Fixes dEQP-VK.info.device_properties
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10470>
We can avoid re-sending the configuration cmdstream constantly if we
know the device has not suspended since the last sampling period.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9901>
Blending in SP_BLEND_CNTL is not a binary flag but the same
mask as in RB_BLEND_CNTL. It is a per-mrt enable bit for blending.
Copied form a6xx, on a5xx it should be have the same since it seems
to have the same structure layout.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10682>
Blending in SP_BLEND_CNTL is not a binary flag but the same
mask as in RB_BLEND_CNTL. It is a per-mrt enable bit for blending.
Example SP_BLEND_CNTL produced by blob on a630 and different MRT
blendings:
SP_BLEND_CNTL: { UNK8 | 0x6 }
SP_BLEND_CNTL: { ENABLED | UNK8 | 0xe }
(Decoded before this commit)
Fixes mis-rendering with D3D11 game "Spelunky 2".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10682>
We were forgetting to increment in the loop, but also it looks from blob
dumps on Pixel 2 like all the pointers it emitted were shifted up by 3
compared to our xml, and that's the same shift that a6xx uses for its
pointers. None of the tests seem to use more than one
border-color-requiring texture, so it's hard to tell.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9904>
We don't support major 1.2 required extensions like timeline semaphores.
Fixes many complaints in the dEQP-VK.info.vulkan1p2.* group.
We were originally bumped to 1.2 in 75755e0eba ("turnip: Pretend to
support Vulkan 1.2") but hopefully that build issue has been fixed in the
entrypoint reworks since then.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10471>
No handling of Acquire/Release because at the moment scheduler
works as if any barrier is Acq+Rel.
Instead of removing scoped_barrier with scope/mode that for TCS
corresponds to a control_barrier or a memory_barrier_tcs_patch in
ir3_nir_lower_tess_ctrl - remove them in emit_intrinsic_barrier.
And do the same for memory_barrier_tcs_patch and control_barrier.
While in any case hw fence/barrier shouldn't be emitted for them,
they still affect ordering of stores, and in feature ir3 backend
may want to have that information.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9054>
nir_intrinsic_memory_barrier has the same semantic as memoryBarrier()
in GLSL, which is:
GLSL 4.60, 4.10. "Memory Qualifiers":
"The built-in function memoryBarrier() can be used if needed to
guarantee the completion and relative ordering of memory accesses
performed by a single shader invocation."
GLSL 4.60, 8.17. "Shader Memory Control Functions":
"The built-in functions memoryBarrier() and groupMemoryBarrier() wait
for the completion of accesses to all of the above variable types."
Fixes tests:
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.image.comp
Fixes: 819a613a ("freedreno/ir3: moar better scheduler")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9054>
We were implictly including it in exposing VK 1.2, but we weren't making
use of the supplied struct. Actually enabling it gives us a chance to do
slightly better at Z/S UBWC, and means we won't lose the separate usage
test coverage when switching back to exposing VK 1.1.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10594>
Instead of using a separate outputs array, make the "end" instruction
(or chmask) take the outputs as sources. This works better for the new
RA, because it better models the fact that outputs are consumed all at
the same time. With the old model, each output collect would be assumed
dead after it was processed and subsequent collects could use it when
inserting shuffle code, which wouldn't work, and the new RA also deletes
collect instructions after lowering them to moves so the information
would be gone after RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
We need a stable order in order to create phi instructions. In the
future we can make this more sophisticated in order to make manipulating
the CFG easier, but for now that only happens after RA, so we won't have
to worry about it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
Originally I wrote this to support multiple ir3 blocks per NIR block,
but this turned out to be more useful for creating a stable ordering to
the predecessors. We compute the predecessors ourselves, rather than
relying on NIR, so that the array of predecessors we create in the next
commit has a stable order we can rely on when creating phi nodes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
There's an optimization here to sink direct (i.e. not relative) reads of
an array past unrelated direct writes. However, since each write
actually reads, modifies, and then writes again to the array, this means
that we need to read the latest updated array. The old RA used the array
id instead of the SSA information, so it didn't care, but the new RA
uses ->instr instead and ignores the array id because arrays are now SSA
so it needs to be correct.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
This wasn't using the same calculation that add_reg_dep() was using to
get the index into state->regs, so it was using the wrong register. Fix
this by folding it into add_reg_dep().
This shouldn't fix anything, because it's just used for scheduler
priorities, but it should reduce nop's and syncs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
a0.x is written as a half-reg, but just interpreting it as "hr61.x" will
result in it overlapping with r30.z in merged mode, which is not what
the hardware does at all. This introduced a spurious dependency on
a write to r30.z which resulted in an assert tripping. Just pretend it's
a full reg instead.
This fixes
spec@arb_tessellation_shader@execution@variable-indexing@vs-output-array-vec3-index-wr-before-tcs
with the new RA.
Fixes: 0f78c32 ("freedreno/ir3: post-RA sched pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
Scheduling don't like address being in the different block from
the instruction.
Fixes a crash in the trace of "War Thunder" (DX11)
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10355>
This provides the upper layer (gallium, etc) a way to ensure that
rendering involving the bo has been flushed all the way to the kernel.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10530>
To reduce flakes, separate out the dEQP-EGL tests that are intentionally
triggering GPU hangs. This avoids some kernel side issues with bad
handling of ringbuffer-full scenarios, causing innocent tests to flake.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10560>
Not all varying fetches could be pulled into the start block.
If there are fetches we couldn't pull, like load_interpolated_input
with offset which depends on a non-reorderable ssbo load or on a
phi node, this pass is skipped since it would be hard to find a place
to set (ei) flag (beside at the very end).
We also don't have to manually set (ei) in such cases since a5xx and
a6xx do automatically release varying storage at the end.
Earlier gens need further testing, however they do not support
interpolateAt* functions at moment, so unless we would like to support
sample shading on them - they are fine.
Fixes crash in GTA V.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10483>
For non 64bit devices the key stored in hash_table_u64 is wrapped in
hash_key_u64 structure, which is never free.
This commit fixes this issue by just removing the user-defined
`delete_function` parameter in hash_table_u64_{destroy,clear} (which
nobody is using) and using instead a delete function to free this
structure.
Fixes: 608257cf82 ("i965: Fix INTEL_DEBUG=bat")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10480>
The combination of removing bottlenecks in userspace (userspace fences,
etc) and slow GPU results in hitting full ringbuffer on a306. Haven't
figured out a reasonable way to work around that in userspace until a
kernel fix is in place, so disable this one for now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Move the submit ioctl to it's own thread to unblock the driver thread
and let it move on to the next frame.
Note that I did experiment with doing the append_bo() parts
synchronously on the theory that we should be more likely to hit the
fast path if we did that part of submit merging before the bo was
potentially re-used in the next batch/submit. It helped some things
by a couple percent, but hurt more things.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
A more direct solution would be for bo's to have a reference to the
device. But bo's are ref/unrefd more frequently.
This avoids async submits unrefing a bo after the device handle-
table is freed.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
For submits flushed with (a) no required fence, and (b) no externally
visible effects (ie. imported/exported bo), we can defer flushing the
submit and merge it into a later submit.
This is a bit more work in userspace, but it cuts down the number of
submit ioctls. And a common case is that later submits overlap in the
bo's used (for example, blit upload to a buffer, which is then used in
the following draw pass), so it reduces the net amount of work needed
to be done in the kernel to handle the submit ioctl.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/19
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
For deferred submits, we still need to do the prep steps immediately,
but the ioctl construction can be deferred.. prepare for that by
splitting the prep out.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Now that we have some bo state tracking for userspace fences, we can
build on this to add a way for the pipe implementation to defer a submit
flush in order to merge submits into a single ioctl.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Move everything into a struct assocated with the pipe_fence_handle, so
that the drm layer can fill in the seqn/fd fences directly.
This will give us a comvenient place to insert a util_queue_fence in the
next commit.
While we're at it, extract the uint32_t fence (previously called
'timestamp' in place, a kgsl legacy) into a struct that encapsulates
both the kernel fence and the userspace fence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
In the common case, a bo will have no more than a single fence attached.
We can inline the storage to avoid a separate allocation in this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Add a per-fd_pipe fence "timeline" so we can detect cases where we don't
need to call into the kernel to determine if a fd_bo is still busy.
This reuses table_lock, rather than introducing a per-bo lock to protect
fence state updates because (a) the common / hotpath pattern is to
update fences on a lot of objects, but checking the fence state of a
single object is less common, and (b) because we already hold the table
lock in common spots where we need to check the bo's fence state (ie.
allocations from the bo-cache).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
There are a couple cases where we want to use _NOSYNC, but at the same
time we want to ensure that rendering related to a bo is actually
flushed.
This doesn't do anything yet, but when we start deferring/merging
submits we'll need a way to trigger anything deferred to flush.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Most of them were actually unused. The memory type (KMEM vs SMI) only
applied to very old a2xx era devices that had a small/fast stacked
memory (SMI) vs normal memory (KMEM). And the cache flags are ignored
(ie. everything is writecombine), but we can add new cache flags later
when they actually do something.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
There is no point in having a write to an attachment enabled when there
is no corresponding output in the shader. Per VK spec it is an UB,
however a few apps depend on attachment not being changed if FS doesn't
have corresponding output.
Fixes water in Genshin Impact.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10489>
Fixes a double-free in dEQP-VK.wsi.display_control.register_device_event
where the fence that we destroyed got destroyed again. Leaving it unset
in the error path leaves the test's NULL in place.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10473>
This code is taken from src/freedreno/isa/decode.c.
Since we need a similar function in panfrost, it's probably good to move
it to utils.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9461>
Do all the necessary lowering in one place, during finalization, and
stop uselessly calling nir_lower_indirect_derefs in turnip. Splitting
i/o to elements should no longer be necessary since we use the i/o
semantics instead of variables now.
This has the side effect that we no longer generate enormous if-ladders
for tess/GS shaders with turnip.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7274>
Don't setup inputs and outputs if we aren't using
load_input/store_output intrinsics. While it's mostly harmless, there
may be more outputs than expected which would lead to an oob write of
the outputs array when setting the register id to INVALID_REG.
Also be more paranoid with asserts to catch this.
Fixes: a6291b1 ("freedreno/ir3: rework setup_{input,output} to make struct varyings work")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7274>
On a6xx/a5xx there is such dependency between branchstack bitfield
and the amount of nested ifs, which could be seen with blob:
IFs BRANCHSTACK
0 0
1 1
2 2
3 2
4 3
5 3
6 4
...
59 30
60 31
61 31
62 32
63 32
64 32
Remove open-coded branchstack for a5xx compute along the way.
Fixes tests:
dEQP-VK.spirv_assembly.instruction.compute.float16.opvectorshuffle.344
dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.344_vert
dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.444_geom
dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.244_tessc
dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.344_frag
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9859>
We can't support more than compiler->branchstack_size diverging threads
in a wave. Thus, doubling the threadsize is only possible if we don't
exceed the branchstack size limit.
As of blob version 512.490.0 - it doesn't have this heuristics.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9859>
There are some LRZ compare op switches that are not supported by
the HW, like GREATER* <-> LESS* ones.
This patch tracks the direction of the switch and disables LRZ
if needed.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7186>
OpTerminateInvocation provides the behavior required by the GLSL
discard statement, which we already implement.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
The "demote" intrinsic has the semantics of D3D discard, which means
it doesn't change the control flow, allowing derivatives to work.
On A6xx there is no known way to check whether invocation was demoted,
thus we use nir_lower_is_helper_invocation.
Add "logical" OPC_DEMOTE which is later translated to "kill".
Such separation is necessary to run "kill" specific optimizations
which are invalid for "demote".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
immediates_count and immediates_size are supposed to have the same
units, but it was only incrementing immediates_count by 1. While we're
here, also fix the case where constants are specified out-of-order.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10291>
These have flaked as Timeouts in CI in the last month. .precision.* is
generally very slow (some in the 15s-30s range), but it's unclear to me
why they sometimes spike up to 60 seconds (thermal throttling?).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10274>
We have to keep sampler uniforms around for later YUV lowering, and we
only need to remove uniforms that take up storage space. Code comes from
radeonsi.
Closes: #4644.
Fixes: de17b4aab5 ("freedreno: Remove uniform variables after finalizing NIR.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10246>
clang generates a warning if there's no explicit break or fall-through
annotation. The latter would be kind of silly in this case, and not
robust against any future changes turning the fall-through invalid.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
If we push a UBO range but then find out at draw-time that part of the
pushed range is out of range of the UBO descriptor, then we have to fill
in the rest of the range with 0's to mimic the bounds-checking that ldc
would've done.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
We were never setting set->size, so we were always copying 0 bytes. But
as we only copy the contents when the layout and therefore the size is
the same, we don't have to take the old size into account anyway.
This fixes some VK_EXT_robustness2 tests that use push descriptors.
Fixes: 6d4f33e ("turnip: initial implementation of VK_KHR_push_descriptor")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
This needs to be part of the compiler because it's the only piece that
we always have access to in all the places ir3_optimize_loop() is
called, and it's only enabled for the whole Vulkan device. Right now
it's just used for constraining vectorization, but the next commit adds
another use.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
This fixes
dEQP-VK.robustness.robustness2.bind.notemplate.r32i.dontunroll.nonvolatile.uniform_buffer.no_fmt_qual.len_260.samples_1.1d.frag,
which accesses the shader UBO with c<a0.x + 512> due to the constant
data UBO coming before it in the const file. The len_256 variant has a
smaller constant data UBO, so it uses c<a0.x + 256> instead, and that
works, so 512 seems to be the real limit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
We forgot to remove the instruction under consideration from instr_list
before inserting it into the block's list, which caused instr_list to
become corrupted. This happened to work but caused further corruption in
some rare scenarios.
Fixes: adf1659 ("freedreno/ir3: use standard list implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
piglit_gl clocked in at 6:12 end-to-end runtime, and piglit_shader spent
2:53 in deqp-runner, so merging them together should be about 9 minutes.
Removing a boot should save us a minute or two of runner time per
pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10243>
When the depth or stencil state changes dynamically, that might affect
LRZ state and we need to recalculate it and emit it again.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
Move them up, so they are initialized even when the dynamic state is
not used.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
Consider a simple loop that does a series of texture instructions and
then reduces the results:
vec4 sum = vec4(0);
for (int i = 0; i < N; i++) {
sum += texture(...);
}
Assume that the loop is unrolled and we schedule the resulting basic
block. Right now, after we schedule the first texture instruction, the
only instructions available to schedule that don't incur a sync are the
instructions to setup the second texture instruction. So we keep picking
the texture instructions, no matter how large N is, resulting in a
pathological schedule for register pressure when N is very large:
sum1 = texture(...);
sum2 = texture(...);
sum3 = texture(...);
...
sum = sum1 + sum2 + sum3 + ...;
In particular this happens with some CTS tests for VK_EXT_robustness2,
where a loop like that with many iterations is marked as [[unroll]],
forcing NIR to unroll it.
This solution is a balance between the current approach and always
scheduling for register pressure (and ignoring sync's). We only allow a
certain number of texture fetches to be in flight before considering
textures to "sync", even though they don't really, both because they
likely *will* sync in reality (overflowing the internal queue of waiting
texture instructions) and because at some point we need the normal
algorithm to kick in and start lowering register pressure.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7571>
Once we insert a use of a given tex or SFU instruction, then we must
wait for that tex/SFU instruction (as well as all earlier ones) to
complete, so we shouldn't penalize further uses, even if a subsequent
tex/SFU instruction gets scheduled after the first use. This especially
matters after the next commit when we start forcibly breaking up long
sequences of texture instructions, since if we schedule a group of 8
texture instructions then we want to schedule the uses of those
instructions in parallel with the next 8 texture instructions to reduce
register pressure.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7571>
Among other things, this gets us GCC 10 (was 6).
Requires some changes to third party components we use:
* Install apitrace (& waffle) from Debian; was hitting issues with the
local build, and it's the same version 9.0 anyway.
* Update Fossilize to a newer commit which builds with GCC 10.
* apt.llvm.org repositories are no longer needed.
* Use an SPIRV-LLVM-Translator commit which builds with LLVM 11.0.1.
* Install XCB packages from Debian, 1.13 fails to build with Python 3.9.
* Install wayland-protocols from Debian, 1.12 is too old for
libgtk-3-dev in bullseye.
LLVM 7/8 packages are no longer available.
Also adapt expected test results to Xvfb now exposing multi-samle
GLXFBConfigs.
v2:
* Install clang instead of clang-11.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3124
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9833>
VK_KHR_spirv_1_4 is trivial because vtn already supports all the added
SPIR-V features that aren't gated behind Vulkan extensions. I've
observed some robustness2 CTS tests requiring this. However there are
a few tests currently failing due to lacking spilling.
VK_EXT_scalar_block_layout should also be trivial, since support for
"straddling" UBO loads was added recently for other reasons. This is
used by every robustness2 CTS test.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8695>
v2:
- Bump up MESA_ROOTFS_TAG instead of arm_build (Michel)
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10136>
Added:
* a pass that renumbers bases of IO intrinsics
* a pass that converts mediump IO to 16 bits, optionally using the new
packed varying slots
* a pass that sets (forces) mediump in IO intrinsics (for testing)
* a pass that remaps VARYING_SLOT_VAR[0..15]_16BIT to VARYING_SLOT_VAR[0..31]
(if some shader stages don't want packed varyings)
* a pass that folds type conversions around texture opcodes into those
opcodes (e.g. tex(f2f32(coord), ..) is changed into tex accepting f16)
* a pass that changes (legalizes) sampler src and dst types based on specified
hw constraints (e.g. derivatives must be the same type as coordinates)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
Before, we would prefer to schedule inputs before kills, which works
assuming that the live range of the bary_ij system value don't get
split and therefore all bary.f are ready at the start of the block.
However live range splitting can mess up that assumption and cause a
kill to get scheduled before a move that leads to a bary.f.
This fixes even e.g. dEQP-GLES2.functional.shaders.discard.basic_always
on a3xx before introducing CSE of collect instructions, but even after
that it could be a problem theoretically as the register allocator
doesn't guarantee that any live ranges aren't split.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10143>
Prior to this commit
(nop3) mad.f32 r0.y, c0.x, r1.w, c0.y
was counted as 4 cat3 instructions (and still 3 cat0/nops) in shader-db
results. With this change, it is counted as only 1 cat3 instruction.
Probably never going to have better shader-db results than this in my
career:
total cat2 in shared programs: 1214667 -> 732058 (-39.73%)
cat2 in affected programs: 1194729 -> 712120 (-40.39%)
helped: 8551
HURT: 0
total cat3 in shared programs: 376448 -> 274745 (-27.02%)
cat3 in affected programs: 344918 -> 243215 (-29.49%)
helped: 7222
HURT: 0
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10116>
The same info is in shader_info. Dedupe.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10094>
The meson.build was unaware of transitive dependencies introduced by
Python imports.
Android still needs fixing. But I did not update the Android files lest
I break the build.
Ideally, we would fix this by using a Python runner that generates
a depfile, similar to how meson creates depfiles for C files by passing
flags -MD -MQ -MF to gcc. But this patch gets the job done, without
stalling on the ideal general solution, by manually tracking the Python
imports in new 'foo_depend_files' variables.
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1466>
We don't check whether there's an intervening write in this pass, which
makes it incorrect. ir3_cp_postsched does check correctly, but we were
accidentally doing it here anyway for some sources.
While we're here, delete some code that was only used in the array case.
Fixes: f370e954 ("freedreno/ir3: handle const/immed/abs/neg in cp")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10076>
Disallow immediates for the source. This was hidden by the fact that we
didn't copy-propagate trivial collect instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10076>
If there was a mix of ldlv and bary.f and we inserted an (ss) *after*
the last input which was a bary.f, then last_input_needs_ss would get
unset, even though it shouldn't. For figuring out whether we need the
(ss), we need to know whether there are any pending ldlv's when
last_input gets executed, not at the end of the block, which means that
the existing code's strategy of inserting it after the whole block has
been processed won't work. Rework it to do the last_input processing in
the main loop instead.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10076>
If subpass doesn't have depth/color attachments - samples count is
devised from VkPipelineMultisampleStateCreateInfo::rasterizationSamples.
Without variableMultisampleRate enabled all pipelines in such subpass
should have the same samples count; variableMultisampleRate allows
to have pipelines with different number of samples in one subpass,
given that it doesn't have depth/color attachments.
Blob doesn't have it enabled but there is no known reason for this.
Passes:
dEQP-VK.pipeline.multisample.variable_rate.*
Fixes test:
dEQP-VK.pipeline.framebuffer_attachment.no_attachments_ms
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9556>
As that handles better, and more clear, the case of bindingCount being
zero. For the case of Anvil and Turnip, this avoids allocating a
non-needed binding when bindingCount is zero.
Inspired on radv, that was what it was doing so far.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4526
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9905>
Across every driver...
v2: Add casts to appease -fpermissive used on CI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9477>
Compressed formats may have compatible formats, however they could
only be sampled, so we should not call tu6_format_color with them.
tu6_format_texture should have the same behaviour for checking swap
so use it for all cases.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10009>
Looks like I put it in the wrong file back when I first caught it. It's a
one-or-twice-a-week back flake that seems to happen. The upcoming
deqp-runner uprev would have caught this mistake.
Fixes: 957132294f ("ci/a5xx: Increase the gles3/31 coverage.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9806>
Apparently I inverted the sense of this flag back when we didn't have
piglit testing. Fixes terrible rendering in minetest, HL2, CS:Source, and
CS.
Fixes: 0369dd9077 ("freedreno/a6xx: Add ARB_depth_clamp and separate clamp support.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9957>
A650 can use the same SSBO descriptor for both 32-bit and 16-bit access,
which makes it easy to enable this extension.
Passes tests that run under:
dEQP-VK.spirv_assembly.instruction.*.16bit_storage.*
Rebased and modified commit from Jonathan Marek.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
When float16 is enabled this will allow to pass a number of
float16 tests.
When A6XX_SP_FLOAT_CNTL_F16_NO_INF is set - all operations which
generate +-infinity generate +-MAX_HALF_FLOAT.
Fixes some tests from:
dEQP-VK.spirv_assembly.instruction.*.float16.*
dEQP-VK.spirv_assembly.instruction.*.float_controls.fp16.*
E.g.:
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.sinh_vert
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_4.length
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log_denorm_flush_to_zero_nostorage
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log2_denorm_flush_to_zero_nostorage
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.inv_sqrt_denorm_flush_to_zero_nostorage
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
NIR has shifts defined as:
opcode("*shr", 0, tuint, [0, 0], [tuint, tuint32], False, ...
However, in ir3 we have to ensure that both operators of shift
instruction have the same bitness.
Let's hope that in future the additional COV for constants would
be optimized away.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
This matches the blob and doesn't require actually implementing controls
since the supported modes are just what the HW does.
Passes tests under:
dEQP-VK.spirv_assembly.*.float_controls.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
cat1 instructions round to zero by default.
When fp16 is enabled this will fix:
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage_frag
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage_vert
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
The merged image contains kernels & rootfs for both arm64 & armhf
baremetal test jobs, and is smaller than either arm{64,hf}_test image
before.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
Doing so in an x86 container via qemu was slow, and started failing
recently after updating to a newer qemu version.
This also results in smaller arm*_test* docker images, since we need to
install fewer Debian packages in them.
As a bonus, this turns some piglit tests from fail to pass (Or maybe
they'll turn out to be flakes? They've passed at least 3 times in a
row).
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
It's flaky in producing Missing results. I've got an uprev that should
avoid the issue (and possibly a followon actual fix), but it's blocked on
being able to rebuild the arm containers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9932>
MAX2(count * struct size, 1) results in 1 for count=0, not the size of a struct.
Since this MAX only seems to exist so we can keep using NULL for error reporting,
just refactor to return a VkResult.
Fixes: ad241b15a9 ("vk: consolidate dynamic descriptor binding sorting")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4522
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9880>
Otherwise pass->tile_align_w will be 0, leading to a divide by zero and
undefined behavior. In practice, I saw this lead to an infinite loop in
tests like
dEQP-VK.draw.instanced.draw_indexed_indirect_vk_primitive_topology_line_list_attrib_divisor_0_multiview
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9606>
A little low-stakes RE effort as I unwind from fighting CI all day. Comes
from diffing dEQP-VK.clipping.user_defined.clip_distance.vert.* on the
blob and comparing to a6xx behavior. (My blob doesn't do tess, so if
there are equivalent tess fields for some of these, I didn't find them)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9870>