Separating atomic opcodes makes possible to express a6xx global
atomics which take iova in SRC1. They would be needed by
VK_KHR_buffer_device_address.
The change also makes easier to distiguish atomics in conditions.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>
It's not actually used for 2d blits, it's supposed to mirror
RB_MRT_BUF_INFO[0].COLOR_FORMAT and seems to be used only when rendering
to the special planar formats, although the blob name seems to suggest
it's connected to LRZ.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6792>
This bit seems like the control for line mode of rastrization.
That can be simply figured out by comparing
dEQP-VK.rasterization.primitives.no_stipple.bresenham_lines,
dEQP-VK.rasterization.primitives.no_stipple.rectangular_lines and
dEQP-VK.rasterization.primitives.no_stipple.lines.
For opengl, the value of bresenham lines mode, which is 0, is set
by default and the value of rectangular mode, which is 0x1, is set
when multi-sampled.
For vulkan, the bresenham lines are enabled when lineRasterizationMode is
VK_LINE_RASTERIZATION_MODE_BRESENHAM_EXT, which sets the bit to 0, while
the value is 1 when it's VK_LINE_RASTERIZATION_MODE_RECTANGULAR_EXT,
that seems to be default.
If both multi-sampled and bresenham-lines are used when primitive type is
line, the bit is to be set as 0 and makes msaa disabled.
Note that this is only tested on a6xx, but I guess it's likely the same
for a5xx.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6020>
We were treating them the same as regular cat2/cat3/cat4 immediates, but
that's not right because cat6 sources are only 8 bits.
Our bindless code was handling this before for bindless resources, and
it was disabled for most other things, so this was mostly harmless, but
fixing it will be necessary for handling ldc offsets.
In addition enable tests for this that were just commented out, and add
a custom test making sure that the immediate source is treated as
unsigned.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
Now they run automatically in parallel with other unit testing, rather
than needing a separate script and environment to run them.
Instead of doing shell script filtering afterwards, I just added a little
flag to suppress printing the path name. Also dropped the "Parsing
<file>" in addition to "Reading <file>" in the tested script, since it's
redundant and baked the path name into the reference.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6360>
Now they run automatically in parallel with other unit testing, rather
than needing a separate script and environment to run them.
Instead of doing shell script filtering afterwards, I just added a little
flag to suppress printing the path name.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6360>
Changes the output format slightly but its needed when we
want to switch to more generic version of isaspec.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11321>
Add a value discovered when investigating how the blob implements
GL_KHR_blend_equation_advanced.
Note that everything added here is a bit speculative, because it's
assuming the blob's implementation of GL_KHR_blend_equation_advanced is
sane. In particular a value of 0x3 seems to solve the UBWC problem as
well, so I'm not sure whether my description of the difference between
0x1 and 0x3 is correct. I'm also surprised that it uses the same value
for the coherent and non-coherent cases when forcing sysmem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12360>
This is supposed to be set when that stage needs the PrimID sysval
preloaded, except for the VS which doesn't have this bit and instead
infers it from the HS or GS bit (depending on whether tess/GS is
enabled). Therefore for HS, GS, and DS we should set it whenever the
corresponding sysval is there. This includes adding a missing
PC_HS_OUT_CNTL, which I confirmed is set when the HS reads PrimID from
the VS. Note that the DS sysval is currently always enabled whenever
there's a GS, if we were to fix that then we should also change the
logic here.
This doesn't fix anything that I know of, but aligns us more with what
the blob does.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
DSPATCHID and HSPATCHID, which we mapped gl_PrimitiveID to, are actually
relative to the current subdraw. Subdraws aren't supported yet by turnip
but they are by freedreno for indirect draws.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
Replace it with a calculation which works for all current GPUs.
Duplicated the calculation in both drivers because freedreno_dev_info isn't
meant for derived parameters (and drivers might want to just calculate on
the fly instead).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
Example of blob's output:
(nop3) shlg.b16 hr8.x, (r)8, (r)hr8.x, 12
It does: (src2 << src1) | src2
src1 and src2 could be GPRs, relative GPRs, relative consts,
or immidiates. However, they could not be plain const registers.
Blob does use it in conjuncture with "samgq" instruction.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11760>
This runs through the SQE bootstrap code to extract the packet-table,
rather than relying on heuristics. As a bonus, it can detect the start
of the LPAC fw in a660+ fw so that we can properly decode the LPAC fw
and packet-table.
Note that this decodes the jmptable as normal instructions, which is a
change in behavior from the previous heuristic based jmptbl extraction.
Not sure if that is a good or bad thing.
For a5xx, for now the legacy heuristic based jmptable decoding is
preserved, at least until enough control regs are figured out.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
When we start running the bootstrap code thru the emulator we will need
the packet-table loading to actually happen. So add this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Some of the a6xx gens will require some control reg initialization, and
go into an infinite loop if they don't see the values they expect, so
we'll need to extract the compute gpu-id.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Allow for different mnemonics depending on whether they are used as
source or destination register, to better reflect what they do.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Blending in SP_BLEND_CNTL is not a binary flag but the same
mask as in RB_BLEND_CNTL. It is a per-mrt enable bit for blending.
Copied form a6xx, on a5xx it should be have the same since it seems
to have the same structure layout.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10682>
When float16 is enabled this will allow to pass a number of
float16 tests.
When A6XX_SP_FLOAT_CNTL_F16_NO_INF is set - all operations which
generate +-infinity generate +-MAX_HALF_FLOAT.
Fixes some tests from:
dEQP-VK.spirv_assembly.instruction.*.float16.*
dEQP-VK.spirv_assembly.instruction.*.float_controls.fp16.*
E.g.:
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.sinh_vert
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_4.length
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log_denorm_flush_to_zero_nostorage
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log2_denorm_flush_to_zero_nostorage
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.inv_sqrt_denorm_flush_to_zero_nostorage
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
The registers were actually different per-stage even though we used the
same type, which resulted in a bunch of incorrectly programmed fields
and confusion. Move the stage-specific values to the registers
themselves, which makes things much less confusing and makes it possible
to set "mergedregs" correctly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9493>
Before, we only used 2k of shared memory.
It was found that 5 lower bits of SP_CS_UNKNOWN_A9B1 do control
the available size of shared memory for compute shaders, with
AVAILABLE_SIZE = (SP_CS_UNKNOWN_A9B1_SHARED_SIZE + 1) * 1k
up to 32k. And SP_CS_UNKNOWN_A9B1_SHARED_SIZE being zero enables
all 32k of shared memory.
Fixes tests:
dEQP-VK.rasterization.line_continuity.line-strip
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_nonlocal.workgroup.guard_local.buffer.comp
dEQP-VK.memory_model.write_after_read.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9157>
Reduce noise in a6xx.xml by removing LO/HI versions of address registers.
Also fix type="address" registers in register packing (use bit size instead
of checking for "waddress" to use qword)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8423>
They seem to be broadly similar to the a3xx ones, albeit with some
things shuffled around and with different units, and the extra layout
mode bits.
We also document the FIRST_EXEC_OFFSET registers, so that we can start
properly setting them all to 0 in freedreno and turnip in later commits.
I discovered the compute one when playing with function support in the
blob CL driver, and added the other registers via analogy (the blob
Vulkan driver sets FIRST_EXEC_OFFSET and the shader VA together in one
packet for all stages, so it seems to really be in the same place for
all stages).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7386>
The *2 here would bump into the *2 in regset, causing assertion failures
dumping CS programs. Just set the mergedregs flag on a6xx, and don't
duplicate the mergedregs logic. If you're dealing with new HW where we
don't know if mergedregs is set, you may need to tweak the flag during
disasm setup for the stats to make sense.
Fixes: f7bd3456d7 ("freedreno: deduplicate a3xx+ disasm")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6323>