mesa/src/freedreno/afuc
Erik Faye-Lund 9bcd538643 freedreno: remove stale makefile
We don't use this makefile, so there's no point in keeping it in the
repo. It's also a generated one, so it contains a lot of... mess.

Fixes: 536f43cb96 ("freedreno: slurp in afuc")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16790>
2022-06-02 09:13:23 +00:00
..
README.rst
afuc.h freedreno/afuc: Add emulator mode to afuc-disasm 2021-05-31 23:34:43 +00:00
asm.c freedreno: silence sometimes-uninitialized warning 2022-01-12 21:20:23 +00:00
asm.h freedreno/afuc: Clean up special regs 2021-05-31 23:34:43 +00:00
disasm.c freedreno: Move afuc tests to meson unit tests. 2021-10-01 23:16:04 +00:00
emu-ds.c freedreno/afuc: Add emulator mode to afuc-disasm 2021-05-31 23:34:43 +00:00
emu-regs.c freedreno/afuc: Add emulator mode to afuc-disasm 2021-05-31 23:34:43 +00:00
emu-ui.c freedreno/afuc: Print uintptr_t with PRIxPTR 2021-06-02 03:57:20 +00:00
emu.c freedreno/afuc: Add emulator support to run bootstrap 2021-05-31 23:34:43 +00:00
emu.h freedreno/afuc: Add emulator support to run bootstrap 2021-05-31 23:34:43 +00:00
lexer.l
meson.build freedreno/afuc: Disable the disassembler on 32-bit builds. 2021-10-19 18:55:07 +00:00
parser.y
util.c freedreno/afuc: Add emulator mode to afuc-disasm 2021-05-31 23:34:43 +00:00
util.h freedreno/afuc: Add emulator mode to afuc-disasm 2021-05-31 23:34:43 +00:00

README.rst

=====================
Adreno Five Microcode
=====================

.. contents::

.. _afuc-introduction:

Introduction
============

Adreno GPUs prior to 6xx use two micro-controllers to parse the command-stream,
setup the hardware for draws (or compute jobs), and do various GPU
housekeeping.  They are relatively simple (basically glorified
register writers) and basically all their state is in a collection
of registers.  Ie. there is no stack, and no memory assigned to
them; any global state like which bank of context registers is to
be used in the next draw is stored in a register.

The setup is similar to radeon, in fact Adreno 2xx thru 4xx used
basically the same instruction set as r600.  There is a "PFP"
(Prefetch Parser) and "ME" (Micro Engine, also confusingly referred
to as "PM4").  These make up the "CP" ("Command Parser").  The
PFP runs ahead of the ME, with some PM4 packets handled entirely
in the PFP.  Between the PFP and ME is a FIFO ("MEQ").  In the
generations prior to Adreno 5xx, the PFP and ME had different
instruction sets.

Starting with Adreno 5xx, a new microcontroller with a unified
instruction set was introduced, although the overall architecture
and purpose of the two microcontrollers remains the same.

For lack of a better name, this new instruction set is called
"Adreno Five MicroCode" or "afuc".  (No idea what Qualcomm calls
it internally.

With Adreno 6xx, the separate PF and ME are replaced with a single
SQE microcontroller using the same instruction set as 5xx.

.. _afuc-overview:

Instruction Set Overview
========================

32bit instruction set with basic arithmatic ops that can take
either two source registers or one src and a 16b immediate.

32 registers, although some are special purpose:

- ``$00`` - always reads zero, otherwise seems to be the PC
- ``$01`` - current PM4 packet header
- ``$1c`` - alias ``$rem``, remaining data in packet
- ``$1d`` - alias ``$addr``
- ``$1f`` - alias ``$data``

Branch instructions have a delay slot so the following instruction
is always executed regardless of whether branch is taken or not.


.. _afuc-alu:

ALU Instructions
================

The following instructions are available:

- ``add``   - add
- ``addhi`` - add + carry (for upper 32b of 64b value)
- ``sub``   - subtract
- ``subhi`` - subtract + carry (for upper 32b of 64b value)
- ``and``   - bitwise AND
- ``or``    - bitwise OR
- ``xor``   - bitwise XOR
- ``not``   - bitwise NOT (no src1)
- ``shl``   - shift-left
- ``ushr``  - unsigned shift-right
- ``ishr``  - signed shift-right
- ``rot``   - rotate-left (like shift-left with wrap-around)
- ``mul8``  - multiply low 8b of two src
- ``min``   - minimum
- ``max``   - maximum
- ``comp``  - compare two values

The ALU instructions can take either two src registers, or a src
plus 16b immediate as 2nd src, ex::

  add $dst, $src, 0x1234   ; src2 is immed
  add $dst, $src1, $src2   ; src2 is reg

The ``not`` instruction only takes a single source::

  not $dst, $src
  not $dst, 0x1234

.. _afuc-alu-cmp:

The ``cmp`` instruction returns:

- ``0x00`` if src1 > src2
- ``0x2b`` if src1 == src2
- ``0x1e`` if src1 < src2

See explanation in :ref:`afuc-branch`


.. _afuc-branch:

Branch Instructions
===================

The following branch/jump instructions are available:

- ``brne`` - branch if not equal (or bit not set)
- ``breq`` - branch if equal (or bit set)
- ``jump`` - unconditional jump

Both ``brne`` and ``breq`` have two forms, comparing the src register
against either a small immediate (up to 5 bits) or a specific bit::

  breq $src, b3, #somelabel  ; branch if src & (1 << 3)
  breq $src, 0x3, #somelabel ; branch if src == 3

The branch instructions are encoded with a 16b relative offset.
Since ``$00`` always reads back zero, it can be used to construct
an unconditional relative jump.

The :ref:`cmp <afuc-alu-cmp>` instruction can be paired with the
bit-test variants of ``brne``/``breq`` to implement gt/ge/lt/le,
due to the bit pattern it returns, for example::

  cmp $04, $02, $03
  breq $04, b1, #somelabel

will branch if ``$02`` is less than or equal to ``$03``.


.. _afuc-call:

Call/Return
===========

Simple subroutines can be implemented with ``call``/``ret``.  The
jump instruction encodes a fixed offset.

  TODO not sure how many levels deep function calls can be nested.
  There isn't really a stack.  Definitely seems to be multiple
  levels of fxn call, see in PFP: CP_CONTEXT_SWITCH_YIELD -> f13 ->
  f22.


.. _afuc-control:

Config Instructions
===================

These seem to read/write config state in other parts of CP.  In at
least some cases I expect these map to CP registers (but possibly
not directly??)

- ``cread $dst, [$off + addr], flags``
- ``cwrite $src, [$off + addr], flags``

In cases where no offset is needed, ``$00`` is frequently used as
the offset.

For example, the following sequences sets::

  ; load CP_INDIRECT_BUFFER parameters from cmdstream:
  mov $02, $data   ; low 32b of IB target address
  mov $03, $data   ; high 32b of IB target
  mov $04, $data   ; IB size in dwords

  ; sanity check # of dwords:
  breq $04, 0x0, #l23 (#69, 04a2)

  ; this seems something to do with figuring out whether
  ; we are going from RB->IB1 or IB1->IB2 (ie. so the
  ; below cwrite instructions update either
  ; CP_IB1_BASE_LO/HI/BUFSIZE or CP_IB2_BASE_LO/HI/BUFSIZE
  and $05, $18, 0x0003
  shl $05, $05, 0x0002

  ; update CP_IBn_BASE_LO/HI/BUFSIZE:
  cwrite $02, [$05 + 0x0b0], 0x8
  cwrite $03, [$05 + 0x0b1], 0x8
  cwrite $04, [$05 + 0x0b2], 0x8



.. _afuc-reg-access:

Register Access
===============

The special registers ``$addr`` and ``$data`` can be used to write GPU
registers, for example, to write::

  mov $addr, CP_SCRATCH_REG[0x2] ; set register to write
  mov $data, $03                 ; CP_SCRATCH_REG[0x2]
  mov $data, $04                 ; CP_SCRATCH_REG[0x3]
  ...

subsequent writes to ``$data`` will increment the address of the register
to write, so a sequence of consecutive registers can be written

To read::

  mov $addr, CP_SCRATCH_REG[0x2]
  mov $03, $addr
  mov $04, $addr

Many registers that are updated frequently have two banks, so they can be
updated without stalling for previous draw to finish.  These banks are
arranged so bit 11 is zero for bank 0 and 1 for bank 1.  The ME fw (at
least the version I'm looking at) stores this in ``$17``, so to update
these registers from ME::

  or $addr, $17, VFD_INDEX_OFFSET
  mov $data, $03
  ...

Note that PFP doesn't seem to use this approach, instead it does something
like::

  mov $0c, CP_SCRATCH_REG[0x7]
  mov $02, 0x789a   ; value
  cwrite $0c, [$00 + 0x010], 0x8
  cwrite $02, [$00 + 0x011], 0x8

Like with the ``$addr``/``$data`` approach, the destination register address
increments on each write.

.. _afuc-mem:

Memory Access
=============

There are no load/store instructions, as such.  The microcontrollers
have only indirect memory access via GPU registers.  There are two
mechanism possible.

Read/Write via CP_NRT Registers
-------------------------------

This seems to be only used by ME.  If PFP were also using it, they would
race with each other.  It seems to be primarily used for small reads.

- ``CP_ME_NRT_ADDR_LO``/``_HI`` - write to set the address to read or write
- ``CP_ME_NRT_DATA`` - write to trigger write to address in ``CP_ME_NRT_ADDR``

The address register increments with successive reads or writes.

Memory Write example::

  ; store 64b value in $04+$05 to 64b address in $02+$03
  mov $addr, CP_ME_NRT_ADDR_LO
  mov $data, $02
  mov $data, $03
  mov $addr, CP_ME_NRT_DATA
  mov $data, $04
  mov $data, $05

Memory Read example::

  ; load 64b value from address in $02+$03 into $04+$05
  mov $addr, CP_ME_NRT_ADDR_LO
  mov $data, $02
  mov $data, $03
  mov $04, $addr
  mov $05, $addr


Read via Control Instructions
-----------------------------

This is used by PFP whenever it needs to read memory.  Also seems to be
used by ME for streaming reads (larger amounts of data).  The DMA access
seems to be done by ROQ.

  TODO might also be possible for write access

  TODO some of the control commands might be synchronizing access
  between PFP and ME??

An example from ``CP_DRAW_INDIRECT`` packet handler::

  mov $07, 0x0004  ; # of dwords to read from draw-indirect buffer
  ; load address of indirect buffer from cmdstream:
  cwrite $data, [$00 + 0x0b8], 0x8
  cwrite $data, [$00 + 0x0b9], 0x8
  ; set # of dwords to read:
  cwrite $07, [$00 + 0x0ba], 0x8
  ...
  ; read parameters from draw-indirect buffer:
  mov $09, $addr
  mov $07, $addr
  cread $12, [$00 + 0x040], 0x8
  ; the start parameter gets written into MEQ, which ME writes
  ; to VFD_INDEX_OFFSET register:
  mov $data, $addr


A6XX NOTES
==========

The ``$14`` register holds global flags set by:

  CP_SKIP_IB2_ENABLE_LOCAL - b8
  CP_SKIP_IB2_ENABLE_GLOBAL - b9
  CP_SET_MARKER
    MODE=GMEM - sets b15
    MODE=BLIT2D - clears b15, b12, b7
  CP_SET_MODE - b29+b30
  CP_SET_VISIBILITY_OVERRIDE - b11, b21, b30?
  CP_SET_DRAW_STATE - checks b29+b30

  CP_COND_REG_EXEC - checks b10, which should be predicate flag?