Commit Graph

67411 Commits

Author SHA1 Message Date
Connor Abbott b559ee709b nir: calculate dominance information 2015-01-15 07:18:58 -08:00
Connor Abbott cff1deff72 nir: add an optimization to turn global registers into local registers
After linking and inlining, this allows us to convert these registers
into SSA values and optimise more code.
2015-01-15 07:18:58 -08:00
Connor Abbott 613bf6818a nir: add a pass to lower atomics
v2: Jason Ekstrand <jason.ekstrand@intel.com>
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott 8692c6a023 nir: add a pass to lower system value reads
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott 8cdcfce5ce nir: add a pass to lower sampler instructions 2015-01-15 07:18:58 -08:00
Connor Abbott 370e875b32 nir: add a pass to remove unused variables
After we lower variables, we want to delete them in order to free up
some memory.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
    whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott 494790b2a9 nir: keep track of the number of input, output, and uniform slots 2015-01-15 07:18:58 -08:00
Connor Abbott c2f36cf125 nir: add a pass to lower variables for scalar backends 2015-01-15 07:18:58 -08:00
Connor Abbott 7f0daaa5e7 nir: add a glsl-to-nir pass
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   Make glsl_to_nir build again
   fix whitespace
2015-01-15 07:18:58 -08:00
Connor Abbott dbb76421da nir: add a validation pass
This is similar to ir_validate.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott 98fa28bff7 nir: add a printer
This is similar to ir_print_visitor.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes
2015-01-15 07:18:58 -08:00
Jason Ekstrand 9b1139649d SQUASH: Fix comments from eric
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-15 07:18:58 -08:00
Jason Ekstrand 8b4c860580 SQUASH: Add an assert 2015-01-15 07:18:58 -08:00
Connor Abbott 2812e5de93 nir: add core helper functions
These include functions for adding and removing various bits of IR and
helpers for iterating over all the sources and destinations of an
instruction. This is similar to ir.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace and automake fixes
2015-01-15 07:18:58 -08:00
Jason Ekstrand f521a3c543 SQUASH: Use the enum for the variable mode 2015-01-15 07:18:57 -08:00
Connor Abbott 30c4678f64 nir: add the core datastructures
This includes all the instructions, ifs, loops, functions, etc. This is
similar to the information in ir.h.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   Include ralloc and hash_table from the util directory
   whitespace fixes

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-By glenn.kennard <glenn.kennard@gmail.com>
2015-01-15 07:18:57 -08:00
Connor Abbott b5ca34a211 nir: add a simple C wrapper around glsl_types.h
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
    whitespace and automake fixes

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-15 07:18:57 -08:00
Connor Abbott 77e7a00267 nir: add initial README
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-15 07:18:57 -08:00
Connor Abbott ab2ae63854 exec_list: add a list_foreach_typed_reverse() macro
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-15 07:18:57 -08:00
Eric Anholt 84ef2d4156 vc4: Add some dumping for STORE_TILE_BUFFER_GENERAL. 2015-01-15 22:21:29 +13:00
Eric Anholt 1b241c59e8 vc4: Add dumping for the TILE_RENDERING_MODE_CONFIG packet.
I wanted to read it, so I wrote parsing.
2015-01-15 22:19:25 +13:00
Eric Anholt d0d6d24723 vc4: Fix CL dumping trying to dump too far.
Execution will end at the cl->next, because that's what ct0ea/ct1ea get
programmed to.
2015-01-15 22:19:25 +13:00
Eric Anholt 0471f72755 vc4: Fix texture type masking.
Everything from ETC1 to RGBA64 was getting its top bit dropped, but we
didn't use any of those formats.
2015-01-15 22:19:25 +13:00
Eric Anholt 6313a2c8f0 vc4: Colormask should apply after all other fragment ops (like logic op).
Theoretically it should apply after dithering as well, but ditehring for
565 happens in fixed function in the TLB store.
2015-01-15 22:19:25 +13:00
Eric Anholt 0289a26201 vc4: No turning unpack arguments into small immediates.
Since unpack only happens on things read from the A register file, we have
to leave them as something that can be allocated to A (temp or uniform).
2015-01-15 22:19:25 +13:00
Eric Anholt 772c47aefe vc4: Move the tests for src needing to be an A register to vc4_qir.c.
I want it from another location.
2015-01-15 22:19:25 +13:00
Eric Anholt 8f2fb68026 vc4: Don't swap the raddr on instructions doing unpacks.
It would mean different unpacking behavior, since only the A file does
unpack (with PM==0).
2015-01-15 22:19:25 +13:00
Eric Anholt 5d5707707f vc4: Don't let pairing happen with badly mismatched unpack flags.
No difference on shader-db, but prevents definite regressions in the
blending changes.
2015-01-15 22:19:25 +13:00
Eric Anholt 3820866e40 vc4: Don't let pairing happen with badly mismatched pack flags.
No difference on shader-db, but will become more important as I introduce
more use of pack flags with the blending changes.
2015-01-15 22:19:25 +13:00
Eric Anholt d1f2fc834d vc4: Fix early Z behavior on hardware.
It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z.  The result was
that you'd get big chunks of your rendering missing.
2015-01-15 22:19:25 +13:00
Michel Dänzer 82b7ee62fc Revert "radeonsi: only set BC_OPTIMIZE_DISABLE when necessary"
This reverts commit 0543630d0b.

It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.

We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.

Sorry I didn't remember this when reviewing the reverted change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-15 15:09:48 +09:00
Michel Dänzer a6a75f1286 st/clover: Adapt to TargetLibraryInfo.h move in LLVM SVN r226078
Trivial.
2015-01-15 12:57:05 +09:00
Ian Romanick 0a0d2c9443 mesa: Micro-optimize _mesa_is_valid_prim_mode
You would not believe the mess GCC 4.8.3 generated for the old
switch-statement.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40)
64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40)

The regression on 32-bit is odd.  Callgrind says the caller,
_mesa_is_valid_prim_mode is faster.  Before it says 2,293,760
cycles, and after it says 917,504.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-14 17:09:50 -08:00
Ian Romanick ead200d156 mesa: Check for vertex program the same way in desktop GL and ES
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Multithread:

32-bit: Difference at 95.0% confidence 0.416027% +/- 0.163529% (n=40)
64-bit: Difference at 95.0% confidence 0.494771% +/- 0.259985% (n=40)

Gl32Batch7 had no difference proven at 95.0% confidence (n=120) on
32-bit or 64-bit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-14 17:09:50 -08:00
Ian Romanick d5f936367f mesa: Drop index buffer bounds check
The previous check was insufficient (as it did not take 'indices' into
consideration), and DX10 hardware does not need this check anyway.

Since index_bytes is no longer used, remove it.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence 1.66929% +/- 0.230107% (n=40)
64-bit: Difference at 95.0% confidence -1.40848% +/- 0.288038% (n=40)

The regression on 64-bit is odd.  Callgrind says the caller,
validate_DrawElements_common is faster.  Before it says 10,321,920
cycles, and after it says 8,945,664.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-14 17:09:50 -08:00
Ian Romanick a4aeb534ea mesa: Only check for a current vertex shader in core profile
This doesn't affect performance, but it feels more correct.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: No difference proven at 95.0% confidence (n=120)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-14 17:09:50 -08:00
Ian Romanick d6c6b186cf mesa: Only validate shaders that can exist in the context
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence 0.495267% +/- 0.202063% (n=40)
64-bit: Difference at 95.0% confidence 3.57576% +/- 0.288175% (n=40)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-14 17:09:50 -08:00
Ian Romanick 14aadbe827 i965: Store the atoms directly in the context
Instead of having an extra pointer indirection in one of the hottest
loops in the driver.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)

v2 (Ken): Cut size of array from 64 to 57 to save memory.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-14 17:01:27 -08:00
Ian Romanick 6ed53c27ef i965: Micro-optimize brw_get_index_type
With the switch-statement, GCC 4.8.3 produces a small pile of code with
a branch.

00000000 <brw_get_index_type>:
  000000:       8b 54 24 04             mov    0x4(%esp),%edx
  000004:       b8 01 00 00 00          mov    $0x1,%eax
  000009:       81 fa 03 14 00 00       cmp    $0x1403,%edx
  00000f:       74 0d                   je     00001e <brw_get_index_type+0x1e>
  000011:       31 c0                   xor    %eax,%eax
  000013:       81 fa 05 14 00 00       cmp    $0x1405,%edx
  000019:       0f 94 c0                sete   %al
  00001c:       01 c0                   add    %eax,%eax
  00001e:       c3                      ret

However, this could be two instructions.

00000000 <brw_get_index_type>:
  000000:       2d 01 14 00 00          sub    $0x1401,%eax
  000005:       d1 e8                   shr    %eax
  000007:       90                      nop
  000008:       90                      nop
  000009:       90                      nop
  00000a:       90                      nop
  00000b:       c3                      ret

The function was also moved to the header so that it could be inlined at
the two call sites.  Without this, 32-bit also needs to pull the
parameter from the stack.  This means there is a push, a call, a move,
and a ret added to a two instruction function.  The above code shows the
function with __attribute__((regparm=1)), but even this adds several
extra instructions.  There is also an extra instruction on 64-bit to
move the parameter to %eax for the subtract.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40)
64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-14 16:56:47 -08:00
Ian Romanick 3f1f1d0df4 meta: Put _mesa_meta_in_progress in the header file
...so that it can be inlined in the two places that call it.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-14 16:55:53 -08:00
Kenneth Graunke 3167a80bb1 i965: Fix "vertex" vs. "geometry" and "VS" vs. "GS" in debug output.
We were happily printing "Native code for unnamed vertex shader" and
"VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output,
as well as the KHR_debug output used by shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-14 16:55:43 -08:00
Kenneth Graunke 68ed14d6ad i965: Pass a shader stage abbreviation to fs_generator().
A lot of messages hardcoded the string "FS", which is confusing on
Broadwell, where we use this code for VS support as well.

shader-db particularly got confused, as it reported two "FS SIMD8"
shaders, and no vertex shaders at all.  Craziness ensued.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-14 16:55:38 -08:00
Samuel Iglesias Gonsalvez efef6c8280 configure: add check for GNU indent
Only GNU indent is supported when indenting autogenerated format_pack.c
and format_unpack.c files. Some non-GNU indent (Mac OS X and FreeBSD)
add extra whitespaces than break the build of those files.

Fallback to 'cat' if a non-GNU indent is found.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=88335

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-14 12:52:22 +01:00
Samuel Iglesias Gonsalvez 6d43a4c338 configure: change required Python Mako version to 0.3.4
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-01-14 12:52:22 +01:00
Iago Toral Quiroga c6a2628950 mesa: rename RGBA8888_* format constants to something appropriate.
The 8888 suggests 8-bit components which is not correct, so
replace that with the actual size of the components in each
format.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-14 07:57:31 +01:00
Jason Ekstrand ae417957e0 i965/miptree_map_blit: Don't do the initial copy if INVALIDATE_RANGE is set
Before we were always coping from the buffer being mapped into the
temporary buffer.  However, if INVALIDATE_RANGE is set, then we know that
the data is going to be junk after we unmap so there's no point in doing
the blit.  This is important because doing the blit will cause a stall 3
lines later when we map the buffer.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-13 22:06:51 -08:00
Tapani Pälli f52fe39d31 mesa/glsl/glapi: enable GL_EXT_draw_buffers extension
Patch enables ES2 extension that utilizes existing ES3 functionality.

Changes make all the subtests to run and pass in WebGL conformance
test 'webgl-draw-buffers' when running Chrome on OpenGL ES, also
Piglit test 'draw_buffers_gles2' passes.

v2: remove unused boolean (Ilia Mirkin)
v3: proper error checking for invalid values (Chad Versace)
v4: run error check explicitly for ES2 and ES3 (Kenneth Graunke)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-14 07:48:51 +02:00
Jason Ekstrand 3a5c7e47fd i965/fs: Allow constant propagation between different types
This will be needed for NIR because it is typeless and treats all constants
as uint32 values and reinterprets them when they are used later.  This
commit allows those values to be properly propagated.

Also, this helps some synmark shaders because it allows us to copy
propagate a 0x00000000UD into a 0.0F in a load_payload, which then lets us
combine 4 load_payloads.

instructions in affected programs:     2288 -> 2144 (-6.29%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-13 13:24:52 -08:00
Chad Versace 610c7486c2 egl/wayland: Fix unused variable warnings
Remove ctx variables unused as of 70e8ccc459.
2015-01-13 11:33:23 -08:00
Mike Mason 90d2a85193 mesa: Enable GL_RGB/GL_RGBA in GLES3 glGetInternalformativ
Removes commit 7894278 changes and moves fix to _mesa_GetInternalformativ().
The original commit enabled the GL_RGB and GL_RGBA unsized internal formats
as valid for render buffers in GLES3, but this is incorrect. They should
have only been enabled for GetInternalformativ()

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88079
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-13 11:23:46 -08:00