This is just a temporary change until we code generate the tile read/write
functions in runtime. The new code avoids an extra memcpy that exists in
u_tile.c functions, from which lp_tile_soa.c was originally based.
This achieves up to 5% improvement, particularly in frames with
little geometry overlap.