Commit Graph

70873 Commits

Author SHA1 Message Date
Jose Fonseca 634cfb9a45 glsl: Specify the shader stage in linker errors due to too many in/outputs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-06-23 12:06:39 +01:00
Dave Airlie 4731be701f docs: update GL3 with softpipe/llvmpipe gpu_shader5 pieces.
This just updates the bits I've added in the previous few patches.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-23 15:55:30 +10:00
Dave Airlie 1a71fbe28c draw/gallivm: add invocation ID support for llvmpipe.
This extends the draw code to add support for invocations.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-23 15:54:07 +10:00
Dave Airlie 40d225803e draw/tgsi: implement geom shader invocation support.
This is just for softpipe, llvmpipe won't work without
some changes.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-23 15:53:49 +10:00
Dave Airlie 24e77cb09f tgsi: handle indirect sampler arrays. (v2)
This is required for ARB_gpu_shader5 support in softpipe.

v2: add support to txd/txf/txq paths.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-23 15:52:48 +10:00
Kenneth Graunke 1762568fd3 nir: Allow vec2/vec3/vec4 instructions in the select peephole pass.
These are basically just moves, so they should be safe as well.

When disabling i965's GLSL IR level scalarizer (channel expressions)
pass, I started seeing NIR code like this:

        if ssa_21 {
                block block_1:
                /* preds: block_0 */
                vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30
                /* succs: block_3 */
        } else {
                block block_2:
                /* preds: block_0 */
                /* succs: block_3 */
        }
        block block_3:
        /* preds: block_1 block_2 */
        vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2

Previously, the GLSL IR scalarizer pass would break the vec4 into a
series of fmovs, which were allowed by the peephole pass.  But with
the vec4 operation, they were not.  We want to keep getting selects.

Normal i965 on Broadwell:
instructions in affected programs:     200 -> 176 (-12.00%)
helped:                                4

With brw_fs_channel_expressions() disabled:
instructions in affected programs:     1832 -> 1646 (-10.15%)
helped:                                30

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-06-22 14:08:36 -07:00
Kenneth Graunke 94e3864707 i965: Add and fix comments in brw_vue_map.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-06-22 14:05:44 -07:00
Kenneth Graunke 38eb9015e3 i965: Split VUE map handling out of brw_vs.c into brw_vue_map.c.
This was originally only used by the vertex shader, but it's now used by
the geometry shader as well, and will also eventually be used for
tessellation control and evaluation shaders.

I suspect it will be easier to find in a file named after the concept.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-06-22 14:05:44 -07:00
Ben Widawsky 90754d2df0 i965/gen9: Implement Push Constant Buffer workaround
This implements a workaround (exact excerpt as a comment in the code). The docs
specify [clearly, after you struggle for a while] that the offset isn't relative
to state base. This actually makes sense. This fixes hangs on SKL.

Buffer #0 is meant to be used for normal uniforms.
Buffer #1 is typically used for gather constants when using RS.
Buffer #1-#3 could be used to push a bunch of UBO data which would just be
  somewhere in memory, and not relative to the dynamic state.

NOTE: I've moved away from the ternary operator for the new gen9 conditions.
Admittedly it's probably not great to do this, but I really want to fix this all
up in the subsequent patch and doing it here makes that diff a lot nicer. I want
to split out the gen8/9 code to make the function a bit more readable, but to
keep this easily cherry-pickable I am doing this fix first. If we decide not to
merge the cleanup patch then I can revisit this.

Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Valtteri Rantala <Valtteri.rantala@intel.com>
2015-06-22 12:11:41 -07:00
Brian Paul 2b07b8d104 mesa: use _mesa_lookup_enum_by_nr() in print_array()
Print GL_FLOAT, etc. instead of hex value.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-06-22 08:46:56 -06:00
Chia-I Wu 8787141429 ilo: emit 3DPRIMITIVE from gen6_3dprimitive_info
It allows us to remove ilo_ib_state::draw_start_offset and
ILO_PRIM_RECTANGLES.  gen6_3d_translate_pipe_prim() is also replaced by
ilo_translate_draw_mode().
2015-06-22 15:18:57 +08:00
Chia-I Wu 58f95b332d ilo: align vertex buffer size in buf_create()
With ilo_format.[ch] moved out of core, the aligning of vertex buffers does
not belong to core anymore.
2015-06-22 15:18:57 +08:00
Chia-I Wu 513bc5d90b ilo: move ilo_format.[ch] out of core
They provide PIPE_FORMAT_x to GEN6_FORMAT_x translation as well as some
convenient helpers.  Move them out of core.
2015-06-22 15:18:56 +08:00
Chia-I Wu 3547bb0783 ilo: add ilo_state_surface_valid_format()
Check if a surface format can be used for the specified access type.
2015-06-22 15:18:56 +08:00
Chia-I Wu aa3e5e0dde ilo: add ilo_state_vf_valid_element_format()
Check if a surface format can be used as a VE format.
2015-06-22 15:18:56 +08:00
Alexandre Courbot da8300cb03 nvc0: use NV_VRAM_DOMAIN() macro
Use the newly-introduced NV_VRAM_DOMAIN() macro to support alternative
VRAM domains for chips that do not have dedicated video memory.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-06-22 01:00:02 -04:00
Alexandre Courbot f22406837f nouveau: support for custom VRAM domains
Some GPUs (e.g. GK20A, GM20B) do not embed VRAM of their own and use
the system memory as a backend instead. For such systems, allocating
objects in VRAM results in errors since the kernel will not allow
VRAM objects allocations.

This patch adds a vram_domain member to struct nouveau_screen that can
optionally be initialized to an alternative domain to use for VRAM
allocations. If left untouched, NOUVEAU_BO_VRAM will be used for
systems that embed VRAM, and NOUVEAU_BO_GART will be used for VRAM-less
systems.

Code that uses GPU objects is then expected to use the NV_VRAM_DOMAIN()
macro in place of NOUVEAU_BO_VRAM to ensure correct behavior on
VRAM-less chips.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-06-22 01:00:02 -04:00
Chia-I Wu 57bdcae9e0 ilo: add ilo_state_compute
Replace gen6_idrt_data with ilo_state_compute, which has a bunch of
validations and is now preferred.
2015-06-22 12:56:55 +08:00
Dave Airlie 2bf5a4211e r600g: ignore sampler views for now.
This fixes a regression in that r600 stopped working when
sampler views were pushed.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-22 14:02:49 +10:00
Rob Clark 66a93a0ff9 freedreno/ir3: pass sz to split_dest()
For query_levels, we generate a getinfo with writemask of (z), which RA
will consider as size==3.  But we were still generating four fanouts.
Which meant that RA would see it as two different register classes,
depending on the path to definer.  Ie. on the getinfo instruction itself
it would see size==3, but when chasing back through the fanouts it would
see size==4.

Easiest way to solve that is to just generate the chain of neighboring
fanouts to have the correct size in the first place.

Note: we may eventually want split_dest() to take start/end or wrmask
instead, since really we only need size==1.  But RA is not clever enough
for that, query_levels is not that common, and the other two registers
that get allocated are never used so those register slots can be
immediately re-used.  So bunch of work for probably no real gain.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 08:01:12 -04:00
Rob Clark 1ee4d51e7a freedreno/ir3/nir: add more opcodes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 08:01:06 -04:00
Rob Clark 43048c7093 freedreno/ir3: only unminify txf coords on a3xx
Seems like a4xx gets this right.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 08:01:05 -04:00
Rob Clark 0f008082b1 freedreno: remove int sampler shader variants
We get this information from NIR (which gets it from sview decl in tgsi
when translating from tgsi), so no need to maintain shader variants for
this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 08:00:58 -04:00
Rob Clark 457f7c2a2a freedreno/ir3: block reshuffling and loops!
This shuffles things around to allow the shader to have multiple basic
blocks.  We drop the entire CFG structure from nir and just preserve the
blocks.  At scheduling we know whether to schedule conditional branches
or unconditional jumps at the end of the block based on the # of block
successors.  (Dropping jumps to the following instruction, etc.)

One slight complication is that variables (load_var/store_var, ie.
arrays) are not in SSA form, so we have to figure out where to put the
phi's ourself.  For this, we use the predecessor set information from
nir_block.  (We could perhaps use NIR's dominance frontier information
to help with this?)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:54:38 -04:00
Rob Clark 660d5c1646 freedreno/ir3: a4xx encodes larger immed offset
Without this, negative branch/jump offsets look like very large positive
offsets.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:54:31 -04:00
Rob Clark d646d3ae9d freedreno/ir3: simplify find_neighbors stop condition
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:54:16 -04:00
Rob Clark c8fb5f8a01 freedreno/ir3: move inputs/outputs to shader
These belong in the shader, rather than the block.  Mostly a lot of
churn and nothing too interesting.  But splitting this out from the
rest of ir3_block reshuffling to cut down the noise in the later
patch.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:54:04 -04:00
Rob Clark d52fb2f5ad freedreno/ir3/ra: use register_allocate
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:58 -04:00
Rob Clark 694beb8b83 freedreno/ir3: introduce ir3_compiler object
Right now, just provides a cleaner way to get at the gpu-id, given the
separation between compiler and context.  But we will need this also to
hold the reg-set for new register allocation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:50 -04:00
Rob Clark 5c1e153467 freedreno/ir3: dump nocp option
No longer used, or even possible, with NIR frontend.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:43 -04:00
Rob Clark 7674ab12e8 freedreno/ir3: silence warnings
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:35 -04:00
Rob Clark 0f6faa8ff3 freedreno/ir3: remove tgsi f/e
Also remove ir3_flatten which was only used by tgsi f/e.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:25 -04:00
Rob Clark 7273cb4e93 freedreno/ir3/sched: convert to priority queue
Use a more standard priority-queue based scheduling algo.  It is simpler
and will make things easier once we have multiple basic blocks and flow
control.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:17 -04:00
Rob Clark adf1659ff5 freedreno/ir3: use standard list implementation
Use standard list_head double-linked list and related iterators,
helpers, etc, rather than weird combo of instruction array and next
pointers depending on stage.  Now block has an instrs_list.  In
certain stages where we want to remove and re-add to the blocks list
we just use list_replace() to copy the list to a new list_head.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:09 -04:00
Rob Clark 67d994c676 freedreno/ir3: drop dot graph dumping
At least for now.. right now the instruction and instruction list
printing should suffice, and the re-working of ir3_block would require
a lot of changes in that code.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:52:58 -04:00
Rob Clark 5c8c2e2f97 freedreno/ir3: more builder helpers
Use ir3_MOV() builder in a couple of spots, rather than open-coding the
instruction construction.  Also add ir3_NOP() builder and use that
instead of open coding.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:52:41 -04:00
Rob Clark b33015f889 gallium/ttn: add missing SNE
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-06-21 07:52:36 -04:00
Rob Clark c79b2e626c util/list: add list_first/last_entry
I need an easier way to get at head/tail in ir3.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:52:36 -04:00
Rob Clark b3d2e36716 gallium/ttn: add texture-type support
v2: rebased on using SVIEW to hold type information

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-21 07:52:29 -04:00
Rob Clark cb258c1dec glsl_to_tgsi: add SVIEW decl support
Freedreno needs sampler type information to deal with int/uint textures.
To accomplish this, start creating sampler-view declarations, as
suggested here:

 http://lists.freedesktop.org/archives/mesa-dev/2014-November/071583.html

create a sampler-view with index matching the sampler, to encode the
texture type (ie. SINT/UINT/FLOAT).  Ie:

   DCL SVIEW[n], 2D, UINT
   DCL SAMP[n]
   TEX OUT[1], IN[1], SAMP[n]

For tgsi texture instructions which do not take an explicit SVIEW
argument, the SVIEW index is implied by the SAMP index.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-21 07:52:22 -04:00
Rob Clark 93379748f7 util/blitter (and friends): generate appropriate SVIEW decls
Some hardware needs to know the sampler type.  Update the blit related
shaders to include SVIEW decl.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-21 07:52:16 -04:00
Rob Clark e536992986 util/pstipple: updates for SVIEW decls
To allow for shaders which use SVIEW decls for TEX* instructions, we
need to preserve the constraint that the shader either has no SVIEW's or
it has one matching SVIEW for each SAMP.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-21 07:52:12 -04:00
Rob Clark b516e68afb draw: updates to support SVIEW decls
To allow for shaders which use SVIEW decls for TEX* instructions, we
need to preserve the constraint that the shader either has no SVIEW's or
it has one matching SVIEW for each SAMP.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-21 07:52:07 -04:00
Rob Clark f481af110e tgsi/transform: add support for SVIEW decls
TODO single return_type (use enum)

v2: single return_type arg, and use enum

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-21 07:52:02 -04:00
Rob Clark b13135e066 tgsi: update docs for SVIEW usage with TEX* instructions
Based on mailing list discussion here:

http://lists.freedesktop.org/archives/mesa-dev/2014-November/071583.html

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-21 07:51:53 -04:00
Eric Anholt 717376155d mesa: Back out an accidental change I had in a VC4 commit.
This was a hack as part of debugging some glamor-on-GLES2 behavior that
ended up being an xserver bug.  I suspect we can just flip this extension
on for GLES2, but the spec says it requires 3.1.
2015-06-20 15:04:17 -07:00
Emil Velikov 104bff0376 docs: add news item and link release notes for mesa 10.5.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-20 16:42:21 +01:00
Emil Velikov aa28423bcc docs: Add sha256sums for the 10.5.8 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit a81b1d5512f64ffca1c13a5937e7eb0de24713ae)
2015-06-20 16:42:21 +01:00
Emil Velikov 97caf2054f Add release notes for the 10.5.8 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 24b043aab73ce066ded6e4bc93f589008dfc8484)
2015-06-20 16:42:21 +01:00
Eric Anholt c009038674 vc4: Use a defined t value for 1D textures.
This doesn't fix the broken 1D cases of texsubimage, but it does prevent
segfaulting when dumping the QIR code generated in fbo-1d.
2015-06-20 00:16:32 -07:00