Winsys context build a list of register block a register block is
a set of consecutive register that will be emited together in the
same pm4 packet (the various r600_block* are there to provide basic
grouping that try to take advantage of states that are linked together)
Some consecutive register are emited each in a different block,
for instance the various cb[0-7]_base. At winsys context creation,
the list of block is created & an index into the list of block. So
to find into which block a register is in you simply use the register
offset and lookup the block index. Block are grouped together into
group which are the various pkt3 group of config, context, resource,
Pipe state build a list of register each state want to modify,
beside register value it also give a register mask so only subpart
of a register can be updated by a given pipe state (the oring is
in the winsys) There is no prebuild register list or define for
each pipe state. Once pipe state are built they are bound to
the winsys context.
Each of this functions will go through the list of register and
will find into which block each reg falls and will update the
value of the block with proper masking (vs/ps resource/constant
are specialized variant with somewhat limited capabilities).
Each block modified by r600_context_pipe_state_set* is marked as
dirty and we update a count of dwords needed to emit all dirty
state so far.
r600_context_pipe_state_set* should be call only when pipe context
change some of the state (thus when pipe bind state or set state)
Then to draw primitive you make a call to r600_context_draw
void r600_context_draw(struct r600_context *ctx, struct r600_draw *draw)
It will check if there is enough dwords in current cs buffer and
if not will flush. Once there is enough room it will copy packet
from dirty block and then add the draw packet3 to initiate the draw.
The flush will send the current cs, reset the count of dwords to
0 and remark all states that are enabled as dirty and recompute
the number of dwords needed to send the current context.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Changes in v3:
- Also change trace, which I forgot about
Changes in v2:
- No longer adds tessellation shaders
Currently each shader cap has FS and VS versions.
However, we want a version of them for geometry, tessellation control,
and tessellation evaluation shaders, and want to be able to easily
query a given cap type for a given shader stage.
Since having 5 duplicates of each shader cap is unmanageable, add
a new get_shader_param function that takes both a shader cap from a
new enum and a shader stage.
Drivers with non-unified shaders will first switch on the shader
and, within each case, switch on the cap.
Drivers with unified shaders instead first check whether the shader
is supported, and then switch on the cap.
MAX_CONST_BUFFERS is now per-stage.
The geometry shader cap is removed in favor of checking whether the
limit of geometry shader instructions is greater than 0, which is also
used for tessellation shaders.
WARNING: all drivers changed and compiled but only nvfx tested
If the buffer we are attempting to map is referenced by the unsubmitted
command stream for this context, we need to flush the command stream,
however to do that we need to be able to access the context at the lowest
level map function, currently we set the buffer in the toplevel map, but this
racy between context. (we probably have a lot more issues than that.)
I'll look into a proper solution as suggested by jrfonseca when I get some time.
adds shader opcodes + assembler support (except ARL)
uses constant buffers
add interp instructions in fragment shader
adds all evergreen hw states
adds evergreen pm4 support.
this runs gears for me on my evergreen
the DDX and r600c both flush cb/db after the draw is emitted,
as long as they do that, r600g can't be different, as it races.
We end up with r600g flush, set CB, DDX set CB, flush. This
was causing misrendering on my evergreen, where sometimes the drawing
would go to an old CB.
This splits the r600 opcodes out of the sq file and adds a wrapper
so we can convert to evergreen opcodes later without touching these functions
too much.
DX9 constants were in the constant file, and evergreen no longer support
cfile. r600/700 can also use constants in memory buffers, so add the code
(disabled for now) to enable that as precursor for evergreen.
This was inherently fragile as any changes to r600_states.h would also
need manual updating of all of the bits in radeon.h. Just add a simple
python script to do the conversion, its not hooked up to make at all.
This also will make adding evergreen a bit easier.
Okay I finally wrapped my head around what r600_context_state is meant to be,
maybe I should just rename all the structs so that have distinct names.
I've no idea however why 16 is a good magic number for R600_MAX_RSTATE.
This was another ugly function that really wasn't needed.
The 3 calls to it from the gallium api were shorter than it,
and all the calls from the set_ functions were pointless.
having some sort of locality of code really matters, just create
and setup state at time. Not sure if this is just further polishing of a bad thing,
but at least it makes it more readable.
Previously bind sampler/sampler_view can be converted and endup
overwritting the current state we want to schedule. Example :
bind texA texB to sampler_view[0] & sampler_view[1], render,
bind texB to sampler_view[0] render. Now state associated to
texB are set to configure sampler_view slot 0, but as we don't
unbind sampler_view[1] still point to texB state so we end up
with sampler_view[1] overwritting sampler_view[0], which gives
wrong rendering if next rendering bind texA to sampler_view[0],
it will endup as texB is bound to sampler_view[0]. If you are
not confuse at that point give me a call i will be buying you
beer.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
One can bind same texture or sampler to different slot,
each slot needs it own state. The solution implemented
here is not exactly beautifull or optimal need to think
to somethings better.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Make state statically allocated, this kills a bunch of code
and avoid intensive use of malloc/free. There is still a lot
of useless duplicate function wrapping that can be kill. This
doesn't improve yet performance, needs to avoid memcpy states
in radeon_ctx_set_draw and to avoid rebuilding vs_resources,
dsa, scissor, cb_cntl, ... states at each draw command.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This reverts commit de0b76cab2, its pre-computes the texture state wrong,
you can't just use an array of levels, since you can have FBOs to depth texture slices inside a level as well
it would get really messy quickly. Probably need to split commits like this up into pieces for each piece
of state, so we can revert bits easier in case of regressions.
This also break 5 piglit tests, and valgrind starts to warn about invalid read/writes after this.
fixes warning that
r600_blit.c: In function ‘r600_resource_copy_region’:
r600_blit.c:136: warning: passing argument 1 of ‘util_resource_copy_region’ from incompatible pointer type
and also 7 more piglit tests.
For some reason r600c, emits extra instructions in the FP to do the depth write output swizzle,
I'm not sure this is required, so here I'm doing it in the exports.
this fixes the mesa trivial demos tri-depthwrite and tri-depthwrite2, it doesn't fix
the glsl1 gl_FragDepth writing test however.
this is a bit of a workaround, something is wrong with the literal emits here
so we just use the trig copy function to copy the immd to a temp at start of op.
fix VP/FP LIT tests
So as the trig functions used up the literal spots for the PI work, if the arg0 was an immediate
we'd hit failure, so copy the literal before starting.
add some tracking of max temp used to avoid trashing temp regs.
5 more piglits, fp1 COS,SCS,SIN tests
Also add an error if we hit this problem again, we need to do this better
possibly tying the literal addition to the last flag.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Idea is to build hw state at pipe state creation and
reuse them while keeping a non PM4 packet interface
btw winsys & pipe driver. This commit also force rebuild
of pm4 packet on each call to radeon_state_pm4 which
in turn slow down everythings, this will be addressed.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The current states code had an unhealthy relationship between
that had to somehow magically align themselves, editing either
place meant renumbering all states after the one you were on,
and it was pretty unapproachable code.
This replaces the huge types structures with a simple type + sub
type struct, which is keyed on an stype enum in radeon.h. Each
stype can have a per-shader type subclassing (4 types supported,
PS/VS/GS/FS), and also has a number of states per-subtype. So you
have 256 constants per 4 shaders per one CONSTANT stype.
The interface from the driver is changed to pass in the tuple,
(stype, id, shader_type), and we look for this. If
radeon_state_shader ever shows up on profile, it could use a
hashtable based on stype/shader_type to speed things up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit bd25e23bf3.
Apart from introducing a lot of hex magic numbers and being highly impenetable code,
it causes lots of lockups on an average piglit run that always runs without lockups.
Always run piglit before/after doing big things like this.
this adds handling for some more CF instructions and conditions
also adds parameter for stack size emission
These seem to pass on VS with the stack size hack but not on FS,
TODO: fix FS + stack size calcs
this makes op2 emission smaller, since it skips instructions
that don't write to the dst. not sure if this could have unwanted
side effects but try it and see.
Directly build PM4 packet, avoid using malloc (no states are
bigger than 128 dwords), remove unecessary informations,
remove pm4 building in favor of prebuild pm4 packet.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
We need to always at least export one component (wether it's depth
or color. Add valid r7xx shader program for depth decompression.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Before using depth buffer as texture, it needs to be decompressed
(tile pattern of db are different from one used for colorbuffer
like texture)
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Partialy fix texturing from depth buffer, depth buffer is tiled
following different tile organisation that color buffer. This
properly set the tile type & array mode field of texture sampler
when sampling from db resource.
Add initial support to untiling buffer when transfering them,
it's kind of broken by corruption the vertex buffer of previous
draw.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This pretty much ports the code from r600c, however it doesn't
always seem to work quite perfectly, but I can't find anything in this
code that is wrong. I'm guessing either literal input or constants
aren't working always.
Apart from the fact that the radeon.h/r600_states.h editing is a nightmare, this
wasn't so bad.
passes piglit user-clip test now also trivial tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This takes the r300g texture format checker and fixes it up for r600g,
it passes glean texSwizzle, pixelformats, and texture_srgb tests,
however I think it L8S8_SRGB is broken as is L8_SRGB, need to investigate.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes fbo-readpixels piglit test, and adds support for swapping
the formats. Not all formats are correct yet I don't think.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Simplify state handly by avoiding state allocation.
Next step is to allocate once for all context packet
buffer and then avoid rebuilding pm4 packet each time
(through use of combined crc) this would also avoid
number of memcpy.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Make sure LIT fills all slot for instruction (can't do W instruction
without having the Z slot filled with at least a NOP).
ALU instruction can't access more than 4 constant, move constant to
temporary reg if we reach the limit.
Fix ALU block splitting, only split ALU after ALU with last instruction
bit sets.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
That is, remove pipe_context::draw_arrays, pipe_context::draw_elements,
pipe_context::draw_arrays_instanced,
pipe_context::draw_elements_instanced,
pipe_context::draw_range_elements.
Some drivers define a generic function that is called by all drawing
functions. To implement draw_vbo for such drivers, either draw_vbo
calls the generic function or the prototype of the generic function is
changed to match draw_vbo.
Other drivers have no such generic function. draw_vbo is implemented by
calling either draw_arrays and draw_elements.
For most drivers, set_index_buffer does not mark the state dirty for
tracking. Instead, the index buffer state is emitted whenever draw_vbo
is called, just like the case with draw_elements. It surely can be
improved.
I am not sure how to properly handle flat shading regarding
non color parameter to fragment shader. It seems we should
still interpolate non color using linear interpolation and
flat shade only apply to color.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Split hw vs pipe states creation handling as hw states group doesn't
match pipe state group exactly. Right now be dumb about that and
rebuild all hw states on each draw call. More optimization on that
side coming.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Previous patch added literal emission to wrong place, we
want to emit literal before emitting a new alu group.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Add texture mapping support, redbook/texbind works if
you comment out glClear and second checkboard. Need to
fix :
- texture overwritting
- lod & mip/map handling
- unormalized coordinate handling
- texture view with first leve > 0
- and many other things
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Writing a compiler is time consuming and error prone in
order to allow r600g to further progress in the meantime
i wrote a simple tgsi assembler, it does stupid thing but
i would rather keep the code simple than having people
trying to optimize code it does.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
some of the ALU instructions are different on r6xx vs r7xx,
separate the alu translation to separate files, and use family
to pick which compile stage to use.
before this change, r600 glxinfo segfaulted in the list code, and I wasn't
debugging another linked list implementation, its 2010 after all.
So add the two missing list macros to the gallium header from X.org list header file (after fixing them), then port all r600 lists to the new header.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Split instruction into scalar in core compiler this simplify
the way we translate the instruction in latter stage.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Disable rendering to avoid GPU lockup.
Use radeondb to debug shader compiler :
radeondb -c gallium.bof
radeondb -s gallium.json
Will print shader generated, best is to use fp demos to test
the compiler.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
- enabled flushing a buffer more than once
- enabled the blitter for r600_clear
- added some more colors to r600_is_format_supported (copied from r600_conv_pipe_format)
- r600_set_framebuffer_state now sets rctx->fb_state
- more states are saved before a blit (had to add some accounting for the viewport and the vertex elements state)
- fixed a few errors with reference counting
Change the way we translate from c_compiler to the
asic specific representation. Should make things
simpler.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
- Wrapped the buffer and texture create/destroy/transfer/... functions
using u_resource, which is then used to implement the resource functions.
- Implemented texture transfers.
I left the buffer and texture transfers separate because one day we'll
need a special codepath for textures.
- Added index_bias to the draw_*elements functions.
- Removed nonexistent *REP and *FOR instructions.
- Some pipe formats have changed channel ordering, so I've removed/fixed
nonexistent ones.
- Added stubs for create/set/destroy sampler views.
- Added a naive implementation of vertex elements state (new CSO).
- Reworked {texture,buffer}_{from,to}_handle.
- Reorganized winsys files, removed dri,egl,python directories.
- Added a new build target dri-r600.