This was originally only used by the vertex shader, but it's now used by
the geometry shader as well, and will also eventually be used for
tessellation control and evaluation shaders.
I suspect it will be easier to find in a file named after the concept.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
When we're done emitting the code for a loop, we need to visit the new
break block, which is the merge block of the current loop, rather than
the old merge block, which is the merge block of the loop containing the
one we just emitted code for.
This implements a workaround (exact excerpt as a comment in the code). The docs
specify [clearly, after you struggle for a while] that the offset isn't relative
to state base. This actually makes sense. This fixes hangs on SKL.
Buffer #0 is meant to be used for normal uniforms.
Buffer #1 is typically used for gather constants when using RS.
Buffer #1-#3 could be used to push a bunch of UBO data which would just be
somewhere in memory, and not relative to the dynamic state.
NOTE: I've moved away from the ternary operator for the new gen9 conditions.
Admittedly it's probably not great to do this, but I really want to fix this all
up in the subsequent patch and doing it here makes that diff a lot nicer. I want
to split out the gen8/9 code to make the function a bit more readable, but to
keep this easily cherry-pickable I am doing this fix first. If we decide not to
merge the cleanup patch then I can revisit this.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Valtteri Rantala <Valtteri.rantala@intel.com>
It allows us to remove ilo_ib_state::draw_start_offset and
ILO_PRIM_RECTANGLES. gen6_3d_translate_pipe_prim() is also replaced by
ilo_translate_draw_mode().
Use the newly-introduced NV_VRAM_DOMAIN() macro to support alternative
VRAM domains for chips that do not have dedicated video memory.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Some GPUs (e.g. GK20A, GM20B) do not embed VRAM of their own and use
the system memory as a backend instead. For such systems, allocating
objects in VRAM results in errors since the kernel will not allow
VRAM objects allocations.
This patch adds a vram_domain member to struct nouveau_screen that can
optionally be initialized to an alternative domain to use for VRAM
allocations. If left untouched, NOUVEAU_BO_VRAM will be used for
systems that embed VRAM, and NOUVEAU_BO_GART will be used for VRAM-less
systems.
Code that uses GPU objects is then expected to use the NV_VRAM_DOMAIN()
macro in place of NOUVEAU_BO_VRAM to ensure correct behavior on
VRAM-less chips.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
For query_levels, we generate a getinfo with writemask of (z), which RA
will consider as size==3. But we were still generating four fanouts.
Which meant that RA would see it as two different register classes,
depending on the path to definer. Ie. on the getinfo instruction itself
it would see size==3, but when chasing back through the fanouts it would
see size==4.
Easiest way to solve that is to just generate the chain of neighboring
fanouts to have the correct size in the first place.
Note: we may eventually want split_dest() to take start/end or wrmask
instead, since really we only need size==1. But RA is not clever enough
for that, query_levels is not that common, and the other two registers
that get allocated are never used so those register slots can be
immediately re-used. So bunch of work for probably no real gain.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We get this information from NIR (which gets it from sview decl in tgsi
when translating from tgsi), so no need to maintain shader variants for
this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This shuffles things around to allow the shader to have multiple basic
blocks. We drop the entire CFG structure from nir and just preserve the
blocks. At scheduling we know whether to schedule conditional branches
or unconditional jumps at the end of the block based on the # of block
successors. (Dropping jumps to the following instruction, etc.)
One slight complication is that variables (load_var/store_var, ie.
arrays) are not in SSA form, so we have to figure out where to put the
phi's ourself. For this, we use the predecessor set information from
nir_block. (We could perhaps use NIR's dominance frontier information
to help with this?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These belong in the shader, rather than the block. Mostly a lot of
churn and nothing too interesting. But splitting this out from the
rest of ir3_block reshuffling to cut down the noise in the later
patch.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Right now, just provides a cleaner way to get at the gpu-id, given the
separation between compiler and context. But we will need this also to
hold the reg-set for new register allocation.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use a more standard priority-queue based scheduling algo. It is simpler
and will make things easier once we have multiple basic blocks and flow
control.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use standard list_head double-linked list and related iterators,
helpers, etc, rather than weird combo of instruction array and next
pointers depending on stage. Now block has an instrs_list. In
certain stages where we want to remove and re-add to the blocks list
we just use list_replace() to copy the list to a new list_head.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
At least for now.. right now the instruction and instruction list
printing should suffice, and the re-working of ir3_block would require
a lot of changes in that code.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use ir3_MOV() builder in a couple of spots, rather than open-coding the
instruction construction. Also add ir3_NOP() builder and use that
instead of open coding.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
v2: rebased on using SVIEW to hold type information
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Freedreno needs sampler type information to deal with int/uint textures.
To accomplish this, start creating sampler-view declarations, as
suggested here:
http://lists.freedesktop.org/archives/mesa-dev/2014-November/071583.html
create a sampler-view with index matching the sampler, to encode the
texture type (ie. SINT/UINT/FLOAT). Ie:
DCL SVIEW[n], 2D, UINT
DCL SAMP[n]
TEX OUT[1], IN[1], SAMP[n]
For tgsi texture instructions which do not take an explicit SVIEW
argument, the SVIEW index is implied by the SAMP index.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some hardware needs to know the sampler type. Update the blit related
shaders to include SVIEW decl.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To allow for shaders which use SVIEW decls for TEX* instructions, we
need to preserve the constraint that the shader either has no SVIEW's or
it has one matching SVIEW for each SAMP.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To allow for shaders which use SVIEW decls for TEX* instructions, we
need to preserve the constraint that the shader either has no SVIEW's or
it has one matching SVIEW for each SAMP.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
TODO single return_type (use enum)
v2: single return_type arg, and use enum
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was a hack as part of debugging some glamor-on-GLES2 behavior that
ended up being an xserver bug. I suspect we can just flip this extension
on for GLES2, but the spec says it requires 3.1.
We need to make sure that when we store the aligned box, we've got
initialized contents in the border. We could potentially just load the
border area, but for now let's get text rendering working in X (and fix
the GL_TEXTURE_2D errors in piglit's texsubimage test and
gl-2.1-pbo/test_tex_image)