I realized that shared registers were never actually getting folded,
even after adding them to valid_flags, because the move wasn't even
being considered.
I looked at the other uses of is_same_type_mov(), and they should be ok
with this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
This fixes a pre-existing bug in ir3, but it showed up even more due to
other changes in this series and it interacts with the logical/physical
CFG split. When both sides of an if end with a jump, a block may become
unreachable via the logical CFG, which can cause problems because it has
no predecessors to figure out the location of live-in non-shared
values. In this case we assume that nir_opt_if has removed any code in
these blocks and just skip processing live-ins for these blocks,
pretending that they aren't live.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
The way that the blob obtains the subgroup id on compute shaders is by
just and'ing gl_LocalInvocationIndex with 63, since it advertizes a
subgroupSize of 64. In order to support VK_EXT_subgroup_size_control and
expose a subgroupSize of 128, we'll have to do something a little more
flexible. Sometimes we have to fall back to a subgroup size of 64 due to
various constraints, and in that case we have to fake a subgroup size of
128 while actually using 64 under the hood, by just pretending that the
upper 64 invocations are all disabled. However when computing the
subgroup id we need to use the "real" subgroup size. For this purpose we
plumb through a driver param which exposes the real subgroup size. If
the user forces a particular subgroup size then we lower
load_subgroup_size in nir_lower_subgroups, otherwise we let it through,
and we assume when translating to ir3 that load_subgroup_size means
"give me the *actual* subgroup size that you decided in RA" and give you
the driver param.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
There are some cases using kgsl backend on linux that is still not usual
setup though, we need to consider too.
Regarding the timeline semaphore feature, we could implement it for
the kgsl backend in the future, and probalby it should be using the
existing code in tu_drm.
See #4738, #4907
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11488>
Assert in dest_regs() that dst_count == 1, since most users of it will
blow up if they encounter multiple destinations, and split out the core
of writes_gpr() so that we can easily make code using it multi-dst
aware.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11565>
These were a holdover from before the src/dst split and are no longer
necessary. Just don't create any dest registers for instructions that
never have a destination.
This has the side-effect that it becomes easier to replace uses of
dest_regs() with a per-register thing, once we start adding support for
multiple destinations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11565>
The compiler *can* eventually chew through all the copy prop, constant
folding, and dead_cf necessary to use just our constant index, but we can
save a whole lot of hassle by chasing the MOVs up front and finding the
constant.
dEQP-VK.ubo.3_level_array.scalar.row_major_mat4.both goes from 2.0s to
1.6s on a release build (3.1s to 2.1s for a debug build like we use in CI).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11613>
If we've identified another use that isn't scheduled yet, we can break
right away rather than iterating through all the other uses. While this
could be optimized further, this simple change makes
dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_ivec4 go
from 40 seconds to 1.9 seconds on a release build according to my
unscientific testing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11613>
Move and rename warn_non_conformant_implementation() to common location
of src/vulkan/util/vk_util.c as vk_warn_non_conformant_implementation().
In freedreno/ci, move MESA_VK_IGNORE_CONFORMANCE_WARNING to common
location of .baremetal-deqp-test-freedreno-vk.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11563>
The full form for ldg.a/stg.a offset is:
g[reg_address + reg_offset << (imm_shift + 2) + imm_offset << 2]
where imm_shift is in [0, 3] and imm_offset is in [0, 3]
a6xx blob was found to produce a bit simplier offset calculations
for TES/TCS shaders in GTA V:
[c002000a_03c14215] ldg.a.f32 r2.z, g[r1.y+((r2.z+1)<<2)], 3;
[c0020004_01c14609] ldg.a.f32 r1.x, g[r1.y+((r1.x+3)<<2)], 1;
Our new syntax:
stg.a.u32 g[r2.x+(r1.x+1)<<2], r5.x, 1
stg.a.u32 g[r2.x+r1.x<<4+3<<2], r5.x, 1
ldg.a.f32 r1.w, g[r1.y+(r1.w+1)<<2], 3
ldg.a.f32 r1.w, g[r1.y+r1.w<<5+2<<2], 3
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11431>
Improves drawoverhead perf through Zink up to 260%
Before:
1, DrawElements ( 1 VBO| 0 UBO| 0 ) w/ no state change, 1518
After:
1, DrawElements ( 1 VBO| 0 UBO| 0 ) w/ no state change, 3981
This brings it close to Freedreno, which has around 4300.
In vkQuake vs params re-emission now occurs in 0.23% of draw calls.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11556>
tu_kgsl.c: tu_enumerate_devices closed fd previously closed by
tu_physical_device_init function.
Move out the fd closing from tu_physical_device_init function because
they do not belong to it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11561>
Fixes:
dEQP-VK.api.info.format_properties.g8b8g8r8_422_unorm
dEQP-VK.api.info.format_properties.b8g8r8g8_422_unorm
and part of:
dEQP-VK.api.info.format_properties.g8_b8_r8_3plane_420_unorm
dEQP-VK.api.info.format_properties.g8_b8r8_2plane_420_unorm
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11562>
RA was manually fiddling with regs to copy over the parallel copy code,
which has to be done in a different way, but if we switch this all over
at once it shouldn't be a problem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11469>
Also change the indexing in ir3_delayslots, so it's finally sane! To do
this we also have to change foreach_ssa_src_n to index srcs instead of
regs, so that the indexing stays in sync.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11469>
Instructions that operate on an array read the previous state of the
array, modify it, and write a new array, at least conceptually before
RA. Previously the same register specified the previous state and acted
as the new state, but this meant that it was both a source and
destination which meant that it was getting in the way of splitting up
sources and destinations. Break out the source into a separate register,
and use the new tied-src infrastructure to share code with a6xx atomics.
With this, there are basically no more special cases for arrays in RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11469>
Previously this was hard-coded for a6xx atomic instructions. However
we'll need a way for array destinations to point to the source with the
previous value of the array when we split them up. This is conceptually
the same as tied source/destinations for a6xx atomics, except that array
writes sometimes won't have a previous value to point to. So move this
into the IR so that it can be more dynamic. As a bonus we can move the
knowledge of a6xx atomics out of RA, where it's out-of-place, and into
the a6xx-specific code that creates them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11469>
The previous implementation had several issues:
- It wasn't checking all the conditions necessary for "this blit updates
the whole surface", like PIPE_MASK_Z but not S on a depth/stencil
buffer.
- It would reset the previous batchbuffer, even if that batch had side
effects on other buffers.
- The layering was painful to follow and made any recursion extra
dangerous.
Now, we use a more conservative test (enough for the resource shadowing
case) and just invalidate the buffer up front, which should have the right
logic for discarding drawing to that resource.
I found I had to add fd_bc_flush_writer() to the end of fd_blitter_blit()
-- a flush was happening at fb state restore time when the discard flag
was set, and losing that flush breaks
dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_cube.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11455>
The idea is that the tests will spend *some* time stalling waiting to
read back results from the GPU. So use a # of jobs that is slightly
more than the # of CPUs to keep the CPUs more busy.
Locally this is dropping a bit more than a minute off a parallel
deqp-gles31 run, so turn it on across the board for a6xx.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11477>
I didn't feel like rewriting ir3_shader_disasm() off of FILE *s, so use
the same trick as the disasm_info path above to write to memory and then
hand the multi-line blob off to mesa_log.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9262>
This means you can get dumps on android, and output on Linux goes to
stderr. However, this does mean that on Linux the output goes from
looking like:
AFTER: ir3_legalize:
block3276208368 {
0000:0001:002: cov.u32s16 hr2.x, c2.x
0000:0002:002: mov.u32u32 r0.x, c0.x
[...]
to:
MESA: info: AFTER: ir3_legalize:
MESA: info: block3405271904 {
MESA: info: 0000:0001:002: cov.u32s16 hr2.x, c2.x
MESA: info: 0000:0002:002: mov.u32u32 r0.x, c0.x
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9262>
With array registers, there are two num's we care about:
1. The base num that the whole array starts at (->array.base)
2. The num that the instruction uses, plus possibly an indirect offset
(->num or ->array.offset)
For parallel copies we always copy the whole array, so (2) is irrelevant
here. For phis and parallel copies inserted for phis, we used
assign_reg() which assigned ->array.base, but we forgot about this when
constructing our own parallel copies for live range splitting, just
setting ->num instead. The parallel copy lowering was also inconsistent
here, using ra_reg_get_num() (which looks at ->array.base for arrays)
for sources but looking at ->num directly for destinations. This makes
everything use ->array.base consistently.
While we're here, make sure to remove IR3_REG_SSA from liveout copies to
make sure printing works correctly.
Fixes: 0ffcb19 ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11422>
Normally something with IR3_REG_ARRAY doesn't have a register assigned,
but we keep IR3_REG_ARRAY for parallel copies after RA because we need
to know the appropriate size. We want to see the register assigned for
these when printing the RA result before parallel copies are lowered.
The register is in ->array.base in this case, so initialize it to
INVALID_REG and print ->array.base if it's been assigned to something,
similar to ->num in the normal case.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11422>
this better enables object-specific (e.g., context) queues where the owner
of the queue will always be needed and various pointers will be passed in
for tasks
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11312>
Should also probably never have been different.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Martin Peres <martin.peres@mupuf.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11337>
Why were they ever different ... ?
Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Martin Peres <martin.peres@mupuf.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11337>
To make sure the index of global bo table in drm_msm_gem_submit_cmd is
valid at actual submit time.
v1. Move the entry_count calculation into the submit request creation
function.
Fixes: #4877
Fixes: 3f229e34 ("turnip: Implement VK_KHR_timeline_semaphore.")
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11260>
The flakiness of this test is due to CI running deqp in parallel, rather
than exposing any underlying driver issue. Just skip it in CI until we
come up with a reasonable way to handle tests to be run in isolation
during a deqp-runner run (likely as part of
https://gitlab.freedesktop.org/anholt/deqp-runner/-/issues/7).
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11333>
They're too slow to run in CI even on non-tiled renderers, they don't
block conformance (unless you crash), and provide unreliable warning
results unless you isolate them from other activity on the system.
This means that the following jobs now skip these tests:
- deqp-iris-*
- deqp-llvmpipe (you know, the one mentioned in the comment!)
- deqp-virgl-gl
- deqp-zink-lvp
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11333>
The mustpass doesn't have any tests matching these, so no need to
skip. These tests only show up if you run without using a mustpass list.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11333>
Trying to run turnip under drm-shim reveals that pretended device
offsets are not sufficiently aligned, failing this assert in tu_pipeline.c:
/* emit program binary & private memory layout
* binary_iova should be aligned to 1 instrlen unit (128 bytes)
*/
assert((binary_iova & 0x7f) == 0);
Round up BO size to 4096 in msm_ioctl_gem_new to avoid this (the kernel
aligns to page size).
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11331>
turnip's DRM device interface requires version 1.6 (for SYNCOBJ).
To unblock use of turnip over drm-shim, raise shim's version to 1.6.
This allows to see shader disassembly, while submission fails with
DRM_SHIM: unhandled core DRM ioctl 0xC4 (0xc01064c4)
TU: error: DRM_IOCTL_SYNCOBJ_RESET failure: Invalid argument
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11331>
The .NORM bit doesn't seem to do what we think or want.. tu also doesn't
set it, and things seem to work out better when we don't.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11343>
We were inserting them in what was NIR's end block with the "end"
instruction, which meant that the moves they generated couldn't be
scheduled with the rest of the last block as part of post-RA scheduling.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
RA currently can't handle a live value that's part of a vector and
introduces extra copies. This was espeically a problem for bary.f, where
the bary coords were being split and repeatedly re-collected. But this
could be a problem in other situations as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
If an instruction's destination is unused, then we shouldn't penalize
it. For example, this helps us schedule atomic operations whose results
aren't read. This works around RA failures when CSE is enabled in some
robustness2 tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
In a scenario where there are a lot of texture fetches with constant
coordinates, this prevents the scheduler from scheduling all the setup
instructions after the first group of textures has been scheduled
because they are the only non-syncing thing and scheduling them didn't
decrease tex_delay. Collects with immed/const sources will turn into
moves of those sources, so we should treat them the same.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
This will be run right after nir->ir3. Even though we have SSA coming
out of NIR, we still need it for NIR registers, even though we keep the
original array around to insert false dependencies.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
The old delay calculation relied on the SSA information staying around,
and wouldn't work once we start introducing phi nodes and making
"normal" values defined in multiple blocks not array regs anymore.
What's worse is that properly inserting phi nodes when splitting live
ranges would make that code even more complicated, and this was the last
place post-RA that actually needed that information.
The new version only compares the physical registers of sources and
destinations. It works by going backwards up to a maximum number of
cycles, so it might be slightly slower when the definition is closer but
should be faster when it is farther away.
To avoid complicating the new method, the old method is kept around, but
only for pre-RA scheduling and it can therefore be drastically
simplified as the array case can be dropped.
ir3_delay_calc() is split into a few variants to avoid an explosion of
boolean arguments in users, especially now that merged_regs now has to
be passed to it.
The new method is a little more complicated when it comes to handling
(rptN), because both the assigner and consumer may be (rptN). This adds
some unit tests for those cases, in addition to dropping the to-SSA code
in the test harness since it's no longer needed.
Finally, ir3_legalize has to be switched to using physical registers for
the branch condition. This was the one place where IR3_REG_SSA remained
after RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
In particular, make sure they have a physreg assigned. This was the last
place after RA where SSA registers were created, which won't work with
the new post-RA delay calculation that relies on the physreg.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
There were two different approaches I saw in the post-RA code for
figuring out what regiser range a relative access touched:
1. Use reg->array.offset and reg->array.size. This is wrong in case
reg->array.offset was non-zero before RA, because array.size is
the size of the whole array and array.offset has the const offset
within the array baked in.
2. Lookup the array from the array ID and use the base + range there.
This is correct, but won't work with the new RA, where an array might
not always be assigned to the same register.
This replaces both methods with a new ir3_register::array.base field,
and switches all the users I could find to it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
To simplify the pre-RA merge set code and express the result live-range
splitting in RA, we need to add support for parallel copy instructions,
and for the merge set code these parallel copies need to be in SSA form.
Parallel copies have multiple destinations by necessity, but there was
no way to express this in the existing IR. In particular there was no
support for marking a register as being a destination, and no support
for indicating which destination register out of several an SSA source
refers to. This replaces ir3_register::instr with ir3_register::def and
re-purposes ir3_register::instr. I haven't propagated this into common
helpers, like ssa(), because that would vastly increase the amount of
churn and the number of places that produce such instructions should be
limited -- only RA will create parallel copies and they will be
destroyed right after RA. In the future swz will have multiple
destinations too, but it will only be created after RA via parallel copy
lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
If the tex/sfu ssa src is from a different block than the one currently
being scheduled, we do not have a valid sched-node. So fallback to
previous behavior rather than dereference an invalid ptr.
Fixes: 7821e5a3f8 ("ir3/sched: Don't penalize uses of already-waited tex/SFU")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10306>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
In order to simplify main DSI host database, split away phy register
definitions used on DSI v2 hosts to the separate database file.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11075>
nir_assign_io_var_locations() does not use outputs_written when
assigning driver locations. Use driver_location to avoid incorrectly
guessing what locations it assigned.
Copied from lavapipe 8731a1beb7
Will fix provoking vertex tf tests when VK_EXT_provoking_vertex
would be enabled:
dEQP-VK.rasterization.provoking_vertex.transform_feedback.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11111>
Otherwise it will store a pointer to already unmapped memory which
could lead to a crash in tu_CmdPushDescriptorSetWithTemplateKHR since
it tries to copy data from the old memory.
Fixes a crash with Zink's new lazy descriptor manager instroduced
in bfdd1d8d
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11137>
Using stripes to deal with the different packet layout variants resulted
in redefining "register" offsets with different values, so use "prefix"
to add a suffix to disambiguate.
drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h:1066: warning: "REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT" redefined
1066 | #define REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT 0x00000006
|
drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h:1057: note: this is the location of the previous definition
1057 | #define REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT 0x00000003
|
(Admittedly it isn't really a "prefix" but that was the field in the
schema available to use, and REG_INDEXED_CP_DRAW_INDIRECT_MULTI_STRIDE
sounds somewhat more funny.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
This runs through the SQE bootstrap code to extract the packet-table,
rather than relying on heuristics. As a bonus, it can detect the start
of the LPAC fw in a660+ fw so that we can properly decode the LPAC fw
and packet-table.
Note that this decodes the jmptable as normal instructions, which is a
change in behavior from the previous heuristic based jmptbl extraction.
Not sure if that is a good or bad thing.
For a5xx, for now the legacy heuristic based jmptable decoding is
preserved, at least until enough control regs are figured out.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
When we start running the bootstrap code thru the emulator we will need
the packet-table loading to actually happen. So add this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Run until the packet-table is populated, so the disassembler can use
this to know the offsets of various pm4 packet handlers without having
to rely on heuristics.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Some of the a6xx gens will require some control reg initialization, and
go into an infinite loop if they don't see the values they expect, so
we'll need to extract the compute gpu-id.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
This is an (at least somewhat complete) logical emulator of the a6xx SQE
that lets us step through firmware execution (bootstrap, cmdstream pkt
handling, etc). It lets us poke at various fw visible state and run
through pm4 packet(s) to better understand what the fw is doing when it
handles various packets.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>