Lifted from ir3. Algorithm is the same; the data structures and interface are
lightly modified to decouple from ir3's IR.
Sequentializing parallel copies after RA is tricky. ir3's implementation works
well enough, so I use that one.
Original implementation by Connor Abbott.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>