Use a single swizzled tile per colorbuf (and per thread) to avoid
accumulating large amounts of cached swizzled data.
Now that the SSE3 code has been merged to master, the performance delta
of this change is minimal, the main benefit is reduced memory usage
due to no longer keeping swizzled copies of render targets.
It's clear from the performance of the in-place version of this code
that there is still quite a bit of time being spent swizzling &
unswizzling, but it's not clear exactly how to reduce that.
The lp_rast_shader_inputs' alignment is irrelevant now that it contains
pointers instead of actual data.
Likewise, lp_rast_triangle's size alignment is meaningless.
Just put a pointer to the state in the tri->inputs struct. Remove
some complex logic for eliminating unused statechanges in bins at the
expense of a slightly larger triangle struct.
Move this code back out to C for now, will generate separately.
Shader now takes a mask parameter instead of C0/C1/C2/etc.
Shader does not currently use that parameter and rasterizes whole
pixel stamps always.
Rather than inserting an lp_rast_fence command at the end of each
bin, have each rasterizer thread call this function directly once
it has run out of work to do on a particular scene.
This results in fewer calls to the mutex & related functions, but more
importantly makes it easier to recognize empty bins.
it was wrong to put this in the fs paths, but it was easier to just
stuff it along the fragment texture sampling paths. the patch
disconnects vertex texture sampling and just maps the textures
before the draw itself and unmaps them after.
This undoes part of commit 8be645d53a
and fixes fd.o bug 28822 as well as other regressions.
The 'draw' module may issue additional state-change commands while
we're inside the draw_arrays/elements() call so it's important to
check for updated state at this point.
We were starting a scene whenever lp_setup_get_vertex_info() was called by
the draw module. So when when all primitives were culled/clipped, not only
did we create a new scene for nothing, but we end up using the old scene
with the old framebuffer state instead of a new one.
Fix consists in:
- don't call lp_setup_update_state() in lp_setup_get_vertex_info() -- no
longer necessary
- always setting the scene state before binning a command -- query
commands were bypassing it
- assert no old scene is reused in lp_setup_bind_framebuffer()