Build packet header once and allow to add fake register support so
we can handle things like indexed set of register (evergreen sampler
border registers for instance.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When occlusion query are running we want to have accurate
fragment count thus disable any early culling optimization
GPU has.
Based on work from Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Fixes this GCC warning.
radeon_state.c: In function 'radeon_state_fini':
radeon_state.c:140: warning: 'return' with a value, in function returning void
Up to 2010-09-19:
r600g: fix tiling support for ddx supplied buffers
9b146eae25
user buffer seems to be broken... new to fix that.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Shader rebuild should be more clever, we should store along each
shader all the value that change shader program rather than using
flags in context (ie change sequence like : change vs buffer, draw,
change vs buffer, switch shader will trigger useless shader rebuild).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This allow to share code path btw old & new, also
remove check on reference this might make things
a little slower but new design doesn't use reference
stuff.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Fixes this GCC warning.
r600_state2.c: In function 'r600_context_flush':
r600_state2.c:946: error: implicit declaration of function 'drmCommandWriteRead'
Winsys context build a list of register block a register block is
a set of consecutive register that will be emited together in the
same pm4 packet (the various r600_block* are there to provide basic
grouping that try to take advantage of states that are linked together)
Some consecutive register are emited each in a different block,
for instance the various cb[0-7]_base. At winsys context creation,
the list of block is created & an index into the list of block. So
to find into which block a register is in you simply use the register
offset and lookup the block index. Block are grouped together into
group which are the various pkt3 group of config, context, resource,
Pipe state build a list of register each state want to modify,
beside register value it also give a register mask so only subpart
of a register can be updated by a given pipe state (the oring is
in the winsys) There is no prebuild register list or define for
each pipe state. Once pipe state are built they are bound to
the winsys context.
Each of this functions will go through the list of register and
will find into which block each reg falls and will update the
value of the block with proper masking (vs/ps resource/constant
are specialized variant with somewhat limited capabilities).
Each block modified by r600_context_pipe_state_set* is marked as
dirty and we update a count of dwords needed to emit all dirty
state so far.
r600_context_pipe_state_set* should be call only when pipe context
change some of the state (thus when pipe bind state or set state)
Then to draw primitive you make a call to r600_context_draw
void r600_context_draw(struct r600_context *ctx, struct r600_draw *draw)
It will check if there is enough dwords in current cs buffer and
if not will flush. Once there is enough room it will copy packet
from dirty block and then add the draw packet3 to initiate the draw.
The flush will send the current cs, reset the count of dwords to
0 and remark all states that are enabled as dirty and recompute
the number of dwords needed to send the current context.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
this adds the bo caching layer and uses it for vertex/index/constant bos.
ctx needs to take references on hw bos so the flushing works okay, also
needs to flush the maps.
This fixes a DRM deadlock in the cubestorm xscreensaver, because somehow
there must not be 2 different BOs relocated in one CS if both BOs back
the same handle. I was told it is impossible to happen, but apparently
it is not, or there is something else wrong.
If the buffer we are attempting to map is referenced by the unsubmitted
command stream for this context, we need to flush the command stream,
however to do that we need to be able to access the context at the lowest
level map function, currently we set the buffer in the toplevel map, but this
racy between context. (we probably have a lot more issues than that.)
I'll look into a proper solution as suggested by jrfonseca when I get some time.
adds shader opcodes + assembler support (except ARL)
uses constant buffers
add interp instructions in fragment shader
adds all evergreen hw states
adds evergreen pm4 support.
this runs gears for me on my evergreen
the DDX and r600c both flush cb/db after the draw is emitted,
as long as they do that, r600g can't be different, as it races.
We end up with r600g flush, set CB, DDX set CB, flush. This
was causing misrendering on my evergreen, where sometimes the drawing
would go to an old CB.
DX9 constants were in the constant file, and evergreen no longer support
cfile. r600/700 can also use constants in memory buffers, so add the code
(disabled for now) to enable that as precursor for evergreen.
This was inherently fragile as any changes to r600_states.h would also
need manual updating of all of the bits in radeon.h. Just add a simple
python script to do the conversion, its not hooked up to make at all.
This also will make adding evergreen a bit easier.
Previously bind sampler/sampler_view can be converted and endup
overwritting the current state we want to schedule. Example :
bind texA texB to sampler_view[0] & sampler_view[1], render,
bind texB to sampler_view[0] render. Now state associated to
texB are set to configure sampler_view slot 0, but as we don't
unbind sampler_view[1] still point to texB state so we end up
with sampler_view[1] overwritting sampler_view[0], which gives
wrong rendering if next rendering bind texA to sampler_view[0],
it will endup as texB is bound to sampler_view[0]. If you are
not confuse at that point give me a call i will be buying you
beer.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Make state statically allocated, this kills a bunch of code
and avoid intensive use of malloc/free. There is still a lot
of useless duplicate function wrapping that can be kill. This
doesn't improve yet performance, needs to avoid memcpy states
in radeon_ctx_set_draw and to avoid rebuilding vs_resources,
dsa, scissor, cb_cntl, ... states at each draw command.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This reverts commit de0b76cab2, its pre-computes the texture state wrong,
you can't just use an array of levels, since you can have FBOs to depth texture slices inside a level as well
it would get really messy quickly. Probably need to split commits like this up into pieces for each piece
of state, so we can revert bits easier in case of regressions.
This also break 5 piglit tests, and valgrind starts to warn about invalid read/writes after this.
Idea is to build hw state at pipe state creation and
reuse them while keeping a non PM4 packet interface
btw winsys & pipe driver. This commit also force rebuild
of pm4 packet on each call to radeon_state_pm4 which
in turn slow down everythings, this will be addressed.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The current states code had an unhealthy relationship between
that had to somehow magically align themselves, editing either
place meant renumbering all states after the one you were on,
and it was pretty unapproachable code.
This replaces the huge types structures with a simple type + sub
type struct, which is keyed on an stype enum in radeon.h. Each
stype can have a per-shader type subclassing (4 types supported,
PS/VS/GS/FS), and also has a number of states per-subtype. So you
have 256 constants per 4 shaders per one CONSTANT stype.
The interface from the driver is changed to pass in the tuple,
(stype, id, shader_type), and we look for this. If
radeon_state_shader ever shows up on profile, it could use a
hashtable based on stype/shader_type to speed things up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit bd25e23bf3.
Apart from introducing a lot of hex magic numbers and being highly impenetable code,
it causes lots of lockups on an average piglit run that always runs without lockups.
Always run piglit before/after doing big things like this.
handle very early errors in pipe_screen creation (failure of
nouveau_screen_init in nv50_screen_create)
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Directly build PM4 packet, avoid using malloc (no states are
bigger than 128 dwords), remove unecessary informations,
remove pm4 building in favor of prebuild pm4 packet.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Apart from the fact that the radeon.h/r600_states.h editing is a nightmare, this
wasn't so bad.
passes piglit user-clip test now also trivial tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This makes it compatible with the modified DRM interface in drm-radeon-testing.
Also, now you need to set RADEON_HYPERZ=1 to be able to use hyperz.
It's not bug-free yet.
Simplify state handly by avoiding state allocation.
Next step is to allocate once for all context packet
buffer and then avoid rebuilding pm4 packet each time
(through use of combined crc) this would also avoid
number of memcpy.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This implements fast Z clear, Z compression, and HiZ support for r300->r500
GPUs.
It also allows cbzb clears when fast Z clears are being used for the ZB.
It requires a kernel with hyper-z support.
Thanks to Marek Olšák <maraeo@gmail.com>, who started this off, and Alex Deucher at AMD for providing lots of hints.
v2:
squashed zmask ram size fix]
squashed r300g/blitter: fix Z readback when compressed]
v3:
rebase around texture changes in master - .1 fix more bits
v4:
migrated to using u_mm in r300_texture to manage hiz/zmask rams consistently
disabled HiZ when using OQ
flush z-cache before turning hyper-z off
update hyper-z state on dsa state change
store depthclearvalue across cbzb clears and replace it afterwards.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Split hw vs pipe states creation handling as hw states group doesn't
match pipe state group exactly. Right now be dumb about that and
rebuild all hw states on each draw call. More optimization on that
side coming.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The driver gets a buffer and its size in resource_from_handle.
It computes the required minimum buffer size from given texture
properties, and compares the two sizes.
This is to early detect DDX bugs.
Also move some initialization from screen init to pre-init, now
that it is possible.
Also import a new vmwgfx drm (1.3) header.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
I have had a look at the libdrm sources and they just contain more or less
the same checking we do in macros, and begin_cs may realloc the CS buffer
if we overflow it, which never happens with r300g. So these are pretty
much useless.
There is a small but measurable performance increase by dropping the two
functions.
The xorg state tracker gets two new options to let the user choose
whether to enable / disable dirty throttling and swapbuffer throttling.
The default value of these options are enabled, unless the winsys
supplies a customizer with other values. The customizer record has been
extended to allow this, and also to set winsys-based throttling on a per-
context basis.
The vmware part of this patch disables the dirty throttling if the kernel
supports command submission throttling, and also in that case sets kernel
based throttling for everything but swapbuffers. The vmware winsys does not
set throttling per context, even if it theoretically could, but instead
sets throttling per screen. This should perhaps be changed, should the
xorg state tracker start to use multiple rendering contexts. Kernel throttling
is off by default for all new screens/contexts, so the dri state tracker
is not affected.
This significantly improves interactivity of the vmware xorg driver.
Cherry-picked from commit a8f3b3f88acc1f0193fa740e76e9d815f07f32ab
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
- Wrapped the buffer and texture create/destroy/transfer/... functions
using u_resource, which is then used to implement the resource functions.
- Implemented texture transfers.
I left the buffer and texture transfers separate because one day we'll
need a special codepath for textures.
- Added index_bias to the draw_*elements functions.
- Removed nonexistent *REP and *FOR instructions.
- Some pipe formats have changed channel ordering, so I've removed/fixed
nonexistent ones.
- Added stubs for create/set/destroy sampler views.
- Added a naive implementation of vertex elements state (new CSO).
- Reworked {texture,buffer}_{from,to}_handle.
- Reorganized winsys files, removed dri,egl,python directories.
- Added a new build target dri-r600.
This fixes a double-free() error when not using a shared memory XImage.
The XDestroyImage() function frees the ximage->data buffer if non-NULL.
If we free it ourselves, we also need to NULL-out the pointer.