Commit Graph

62995 Commits

Author SHA1 Message Date
Kenneth Graunke cca6dc9f0f i965/fs: Rip struct brw_wm_compile out of the visitors and generators.
Instead, just pass the key and prog_data as separate parameters.

This moves it up a level - one step further toward getting rid of it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:20 -07:00
Kenneth Graunke 2d4ac9b5b8 i965/fs: Plumb a mem_ctx all the way through the FS compile.
'c' is going away, but we still need a memory context that lives
for the duration of the compile.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:20 -07:00
Kenneth Graunke 25f8fbbf2f i965/fs: Use 'c' as the mem_ctx in fs_visitor.
Previously, the memory context situation was a bit of a mess:

fs_visitor allocated its own memory context, and freed it in the
destructor.  However, some data produced by fs_visitor (such as the list
of instructions) needs to live beyond when fs_visitor is "done", so the
caller can pass it to fs_generator.

Everything worked out because brw_wm_fs_emit's fs_visitor variables
happen to not go out of scope until the end of the function.  But that
meant that moving the declaration of, say, the SIMD16 fs_visitor
instance, could cause everything to explode.

Using a memory context that exists for the duration of the compile is
clearer, and should be equivalent.

Ultimately, we don't want to use 'c', but this matches the behavior of
fs_generator and gen8_fs_generator, so it'll be simple to change later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:20 -07:00
Kenneth Graunke 81b11bf093 i965/fs: Actually free program data on the error path.
We throw away the data generated during compilation on the success path,
so we really ought to on the failure path as well.  The caller has no
access to it anyway, so it's purely leaked.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:20 -07:00
Kenneth Graunke c96fdeb723 i965/fs: Replace c->key with a direct reference in the generators.
'c' is going away.  This is also a bit shorter.

Marking the key pointer as const will also deter people from changing
it in these classes, as that's absolutely not OK.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:19 -07:00
Kenneth Graunke 65b2df3ec8 i965/fs: Replace c->key with a direct reference in fs_visitor.
'c' is going away.  This is also shorter.

Marking the key pointer as const will also deter people from changing
it in fs_visitor, as it's absolutely not OK to modify it there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:19 -07:00
Kenneth Graunke b61d055d66 i965/fs: Replace c->prog_data with a direct reference in the generators.
'c' is going away.  This is also a bit shorter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:19 -07:00
Kenneth Graunke 8a04e0de8b i965/fs: Replace c->prog_data with a direct reference in fs_visitor.
'c' is going away.  This is also a bit shorter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:19 -07:00
Kenneth Graunke 55f4e3a06b i965/fs: Move some flags that affect code generation to fs_visitor.
runtime_check_aads_emit isn't actually used currently, but I believe
we should be using it on Gen4-5, so I haven't eliminated it.
See https://bugs.freedesktop.org/show_bug.cgi?id=78679 for details.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:19 -07:00
Kenneth Graunke 8ef78828fa i965/fs: Move payload register info from brw_wm_compile to fs_visitor.
This data is created by fs_visitor and only used when emitting code,
so keeping it in fs_visitor makes sense.  I decided it would be
reasonable to group these all together in a struct, since they're
highly related.

v2: s/nr_payload_regs/payload.num_regs/ in some comments (chrisf).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:19 -07:00
Kenneth Graunke c76e6db05f i965/fs: Simplify gl_SampleMaskIn handling.
As far as I can tell, there's no point in allocating an extra register
and generating a MOV---we can just use the copy provided as part of our
thread payload directly.  It's already in the right format.

Of course, there are zero Piglit tests for this.  We don't actually ship
the extension (GL_ARB_gpu_shader5) that exposes this functionality
either.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:18 -07:00
Kenneth Graunke 5cd7cf58e6 i965/fs: Rename c->sample_mask_reg to sample_mask_in_reg.
This is actually for gl_SampleMaskIn, which is quite different than
gl_SampleMask.  Renaming should help avoid confusion.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:18 -07:00
Kenneth Graunke db9c915abc i965/fs: Move c->last_scratch into fs_visitor.
Nothing outside of fs_visitor uses it, so we may as well keep it
internal.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:18 -07:00
Kenneth Graunke 7e28bd797d i965/fs: Move total_scratch calculation into fs_visitor::run().
With this one use gone, c->last_scratch is now only used inside
fs_visitor.  The rest of the driver uses prog_data->total_scratch.

We already compute similar prog_data fields in fs_visitor, so this
seems reasonable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:18 -07:00
Kenneth Graunke c51163b0cf i965/fs: Move perf_debug about register spilling to a more obvious spot.
The if (!allocated_without_spills) block is an obvious spot for this
performance warning message.

In the Vec4 backend, scratch is also used for indirect access of
temporary arrays.  The FS backend doesn't implement that yet, but
if it did, this message would be inaccurate, since scratch access
wouldn't necessarily mean spilling.  Moving it preemptively fixes that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-05-18 23:35:18 -07:00
Kenneth Graunke db1449b700 i965: Rename brw/gen8_dump_compile to brw/gen8_disassemble.
"Disassemble" is an accurate description of what this function does.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-05-18 23:35:18 -07:00
Kenneth Graunke 4b04152db0 i965: Rename brw_disasm/gen8_disassemble to brw/gen8_disassemble_inst.
We're going to use "disassemble" for the function that disassembles
the whole program.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-05-18 23:35:17 -07:00
Kenneth Graunke 4a2f0e305c i965: Fix dump_prog_cache to handle compacted instructions.
dump_prog_cache has interpreted compacted instructions as full size
instructions, decoding garbage and complaining about invalid values.

We can just use brw_dump_compile to handle this correctly in less code.
The output format changes slightly, but it's still perfectly acceptable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-05-18 23:35:17 -07:00
Kenneth Graunke 3285bc97ef i965: Use brw_dump_compile for clip, SF, and old GS programs.
Looping over the instructions and calling brw_disasm doesn't handle
compacted instructions.  In most cases, this hasn't been a problem since
we don't compact prior to Sandybridge.

However, Sandybridge's transform feedback GS program should already be
compacted, and so this ought to fix decoding of that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-05-18 23:35:17 -07:00
Ilia Mirkin 5b8f1a0f7c nv50/ir: fix integer mul lowering for u32 x u32 -> high u32
UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
2014-05-18 17:59:16 -04:00
Ilia Mirkin 4ebaabcccb nv50/ir: make sure that texprep/texquerylod's args get coalesced
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
2014-05-18 17:59:16 -04:00
Rob Clark acc1651711 freedreno/a3xx: use util_format_compose_swizzles()
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-18 16:05:06 -04:00
Rob Clark 88ba9de917 freedreno/a3xx/compiler: 1D textures
Gallium already gives us height==1 for these, so the texture state is
already setup correctly to emulate 1D textures as a Nx1 2D texture.  We
just need to supply the .y coord.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-18 15:23:53 -04:00
Rob Clark 6f84f64643 freedreno: fix caps
In particular, we want mesa to emulate primitive restart for us.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-18 15:22:55 -04:00
Rob Clark f7debd4a3e freedreno: fix index buffer offset
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-18 15:22:25 -04:00
Rob Clark 5646319f25 freedreno/a3xx: add sRBG texture support
That was easy.  Turns out it is just a matter of setting one bit.
Enable sampling from sRGB texture, and therefore enable GL 2.1 :-)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-16 20:48:40 -04:00
Rob Clark 9227e6c98c freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-16 20:08:09 -04:00
Roland Scheidegger 3bf2d86c09 gallivm: (trivial) fix compilation with llvm 3.1, 3.2
I actually checked the getModuleIdentifier() function exists with 3.1 but
missed that the file moved...
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=78803
2014-05-17 02:03:35 +02:00
Roland Scheidegger 3a1da0abee gallivm: print out how long it takes to optimize shader IR.
Enabled with GALLIVM_DEBUG=perf (which up to now was only used to print
warnings for unoptimized code).

While some unexpectedly long shader compile times for some shaders were fixed
with 8a9f5ecdb1 this should help recognize such
problems in the future. For now though only available in debug builds (which
are not always suitable for such analysis). And since this uses system time,
it might not be all that accurate (even llvmpipe's own rasterization threads
might be running at the same time, or just other tasks).
(llvmpipe also has LP_DEBUG=counters but this only gives an average per shader
and the the total time for all shaders.)
This prints information like this:
optimizing module fs17_variant0 took 1 msec
optimizing module setup_variant_0 took 0 msec
optimizing module draw_llvm_vs_variant0 took 9 msec
optimizing module draw_llvm_vs_variant0 took 12 msec
optimizing module fs17_variant1 took 2 msec

v2: rebase for recent gallivm compilation changes, and print time for whole
modules instead of functions (otherwise it would be very spammy since it would
include all trivial inline sse2 functions), using the shiny new module names,
prying them off LLVM using new helper (not available through C bindings).
Per function timings, while possibly giving more information (if there'd be
a problem only in for instance the partial not the whole function), don't seem
all that useful for now.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 22:50:14 +02:00
Roland Scheidegger 26cac02c51 gallivm: give more verbose names to modules
When we had just one module "gallivm" was an appropriate name. But now we have
modules containing all functions for a particular variant, so give it a
corresponding name (this is really just for helping debugging).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 22:50:14 +02:00
Brian Paul ef6b6658f9 mesa: fix double-freeing of dispatch tables inside glBegin/End.
We allocate dispatch tables for BeginEnd and OutsideBeginEnd.  But
when we destroy the context we were freeing the BeginEnd and Exec
tables.  If Exec==BeginEnd we did a double-free.  This would happen
if the context was destroyed while inside a glBegin/End pair.  Now
free the BeginEnd and OutsideBeginEnd pointers.

Cc: "10.1", "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-05-16 07:14:57 -06:00
Matt Turner 730bc124c3 i965: Use binary literals counter select.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 23:31:27 -07:00
Michel Dänzer 2bab95973d glsl_to_tgsi: Make sure the 'shader' member is always initialized
Fixes the valgrind report below and random crashes with piglit on radeonsi.

==30005== Conditional jump or move depends on uninitialised value(s)
==30005==    at 0xB13584E: st_translate_program (st_glsl_to_tgsi.cpp:5100)
==30005==    by 0xB14698B: st_translate_fragment_program (st_program.c:747)
==30005==    by 0xB14777D: st_get_fp_variant (st_program.c:824)
==30005==    by 0xB11219C: get_color_fp_variant (st_cb_drawpixels.c:1042)
==30005==    by 0xB1131AE: st_DrawPixels (st_cb_drawpixels.c:1154)
==30005==    by 0xAFF8806: _mesa_DrawPixels (drawpix.c:162)
==30005==    by 0x4EB86DB: stub_glDrawPixels (generated_dispatch.c:6640)
==30005==    by 0x4F1DF08: piglit_visualize_image (piglit-util-gl.c:1574)
==30005==    by 0x40691D: draw_image_to_window_system_fb(int, bool) (draw-buffers-common.cpp:733)
==30005==    by 0x406C8B: draw_reference_image(bool, bool) (draw-buffers-common.cpp:854)
==30005==    by 0x40722A: piglit_display (alpha-to-coverage-dual-src-blend.cpp:117)
==30005==    by 0x4EA7168: run_test (piglit_fbo_framework.c:52)

Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-16 11:12:45 +09:00
Roland Scheidegger b416645387 gallivm: remove optimization workaround when not having sse 4.1
This workaround doesn't list any llvm version, but it was introduced
2010-06-10 (e277d5c1f6). It is unlikely
this bug is still present in llvm versions we support (3.1+).
There's no specific test listed, but I ran lp_test_arit (which uses
the mentioned functions) on llvm 3.1 and 3.3 with sse41 disabled and
this pass enabled without issues.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 01:09:34 +02:00
Roland Scheidegger 93731fbeec gallivm: remove workaround for reversing optimization pass order.
32bit code generation and llvm >= 2.7 used a different optimization pass
order - this code was initially introduced (2010-07-23) by
815e79e72c, apparently due to buggy code being
generated with then brand new llvm versions (which was llvm 2.7 plus pre 2.8
devel).
It seems very highly likely that whatever this bug was it has been fixed in
newer llvm versions, though there's no easy way to test this - the mentioned
piglit test has been removed years ago, and even if you'd build it I'm
sceptical the glsl compiler would still produce the required code to trigger
it.
I have no idea what a good order of passes is, but just remove the workaround
and use the same order everywhere.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 01:09:34 +02:00
Matt Turner 8a6f7dfc19 i965/gen8: Make disassembly function match brw's signature.
gen8_dump_compile will be called indirectly by code common used by
generations before and after the gen8 instruction format change.

Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 15:45:40 -07:00
Matt Turner 1ef52d6ab3 i965: Pass brw_context and assembly separately to brw_dump_compile.
brw_dump_compile will be called indirectly by code common used by
generations before and after the gen8 instruction format change.

Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 15:45:40 -07:00
Matt Turner 74b252d270 i965: Pull brw_compact_instructions() out of brw_get_program().
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 15:45:40 -07:00
Matt Turner cce3bea2a7 i965/disasm: Align send instruction meta-information with dst.
Has been misaligned since we added instruction offset prefixes.

Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 15:45:40 -07:00
Matt Turner e00fe451b8 i965/disasm: Disassemble the compaction control bit.
brw_disasm doesn't disassemble compacted instructions, so we uncompact
before disassembling them which would unset the compaction control bit.
Instead pass it as a separate argument.

Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 15:45:40 -07:00
Matt Turner 58bcf5996d i965/cfg: Embed exec_node in bblock_link.
In order to remove bblock_link's inheritance of exec_node. Also makes
linked list walk code much nicer.

Acked-by: Eric Anholt <eric@anholt.net>
2014-05-15 15:45:40 -07:00
Matt Turner a77023c992 i965/cfg: Make brw_cfg.h closer to C-includable.
Only bblock_link's inheritance left.

Acked-by: Eric Anholt <eric@anholt.net>
2014-05-15 15:45:40 -07:00
Matt Turner d4d843e02f i965/cfg: Protect brw_cfg.h from multiple inclusion.
Acked-by: Eric Anholt <eric@anholt.net>
2014-05-15 15:45:39 -07:00
Matt Turner 9b0108ddc1 glsl: Add C-callable fprint_ir function.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 15:45:39 -07:00
Topi Pohjolainen d45fadf11a i965/fb: Use meta path for stencil up/downsampling
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-05-15 21:39:33 +03:00
Topi Pohjolainen 475216a4f0 i965/meta: Stencil blit for miptree updownsampling
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 21:39:33 +03:00
Topi Pohjolainen b18f6b9b86 i965/fb: Use meta path for stencil blits
This is effective only on gen8 for now as previous generations still
go through blorp.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 21:39:33 +03:00
Topi Pohjolainen d1829badf5 i965/meta: Stencil blits
v2: Create the intel renderbuffer with level hardcoded to zero instead
    of overriding it in the surface state configuration. Also moved the
    dimension adjustments for tiling, mip level, msaa into the render
    buffer creation. Finally prepares for another blit path needed for
    miptree updownsampling.
v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()"

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-05-15 21:39:33 +03:00
Topi Pohjolainen 9d752c098c i965: Extend brw_get_rb_for_first_slice() for specified level/layer
v2: Configure stencil directly for final dimensions instead of
    adjusting bit by bit for tiling, mip level and msaa.
v3 (Ken): Used non-static constant for horizontal alignment

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 21:39:33 +03:00
Topi Pohjolainen 36caae48b2 i965/gen8: Surface state overriding for stencil
v2: Allow hardware to offset accesses to individual layers. Also leave
    the mip-level overriding for the creator of the intel renderbuffer
    to handle. Merged with "i965/gen8: Allow stencil buffers to be
    configured as single sampled"

Ken: I left the "_mesa_problem()" still in place. I think it is clearer
     to remove it in a separate patch.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-15 21:39:32 +03:00