Commit Graph

98046 Commits

Author SHA1 Message Date
Jason Ekstrand 8915621882 intel/blorp: Take a range of layers in blorp_ccs_resolve
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:22:13 -08:00
Jason Ekstrand 67b676f0c5 intel/blorp: Add initial support for indirect clear colors
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-27 16:22:12 -08:00
Jason Ekstrand 85aa4074a2 i965/blorp: Use a designated initializer for blorp_surf
This way uninitialized fields get automatically zeroed and it's safe to
add more fields to blorp_surf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:22:12 -08:00
Jason Ekstrand 86becfd2de intel/blorp: Add fast-clear to the special case in MSAA resolves
This doesn't go all the way of avoiding the txf_ms if it's fast-cleared,
however it does at least make us only do it once.  This should improve
performance of MSAA resolves in the presence of lots of clear color.
Without the patch, enabling fast-clears in the multisampling Sascha demo
drops the framerate by about 10%.  With this patch, enabling fast-clears
increases the demo's framerate by 25%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:22:11 -08:00
Jason Ekstrand dc21c3937c intel/blorp/blit: Rename blorp_nir_txf_ms_mcs
That name is already taken by one of the helpers in blorp_nir_builder.h
and, while we haven't moved the guts of blorp_blit.c there yet, we'd
like to start using some things from that header.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-11-27 16:19:38 -08:00
Rob Herring 46148be8e4 Android: disable warnings causing errors
AOSP master has changed the build default to -Werror making all the
warnings errors. Override that with -Wno-error.

Signed-off-by: Rob Herring <robh@kernel.org>
2017-11-27 17:26:45 -06:00
Timothy Arceri 3e789026ca st/glsl_to_tgsi: make use of driver_cache_blob with the disk cache
driver_cache_blob was introduced with the i965 disk cache, it allows
us to simplify the cache a little and possibly offers some minor
speed improvements since we load the GLSL metadata and TGSI from
disk in one pass.

Using driver_cache_blob should also make it straight forward to
implement binary support for ARB_get_program_binary in gallium.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-28 09:01:44 +11:00
Gwan-gyeong Mun 4cb27047c8 glsl: Fix typo nagivation -> navigation
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-11-28 08:48:55 +11:00
Emil Velikov c7616ac069 gl_table.py: add extern C guard for the generated glapitable.h
The header can be included from C++, hence contents should have
appropriate notation.

Cc: mesa-stable@lists.freedesktop.org
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-11-27 19:23:05 +00:00
Marek Olšák 6b8909f2d1 ac: pack legacy_surf_level better
r600_texture: 1488 -> 1248 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-27 14:46:16 +01:00
Marek Olšák ec15ff78c3 ac: change legacy_surf_level::slice_size to dword units
The next commit will reduce the size even more.

v2: typecast to uint64_t manually
v3: add more typecasts, add asserts

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-27 14:44:04 +01:00
Marek Olšák 474b4a9191 ac: pack ac_surface better
r600_texture: 1736 -> 1488 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-27 14:12:38 +01:00
Marek Olšák b5444877c0 radeonsi: always initialize max_forced_staging_uploads
r600_resource is malloc'd.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103808
Fixes: 4b0dc098b2 ("gallium/u_threaded: don't map big VRAM buffers for the first upload directly")

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-27 14:12:38 +01:00
Marek Olšák 95cd74abd4 radeonsi: remove an old hack for evergreen
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-27 14:12:38 +01:00
Marek Olšák 1cb731012c radeonsi: set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST when profitable
ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-27 14:12:38 +01:00
Dave Airlie 043d14db30 ac/nir: don't write tcs outputs to LDS that aren't read back.
If the TCS doesn't read back the outputs, no need to store them
to LDS in the first place. (except for tess factors).

This seems to give about 50fps (3290->3330) with tessellation demo.

I haven't tested if it impacts DoW3 at all.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-27 13:50:24 +10:00
Dave Airlie 33dca36f4f nir: fill outputs_read field and add patch outputs read (v2)
This is to be used for TCS optimisations on radv.

v2: don't set written on reads (nha)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-11-27 13:50:03 +10:00
Dave Airlie fd301472bd r600/eg: dump event type in dumps
This just makes it easier to debug some things.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-27 12:53:18 +10:00
Tobias Klausmann 068a72fbcb nouveau/compiler: Allow to omit line numbers when printing instructions
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!

V2:
 - Use environmental variable (Karol Herbst)
V3:
 - Use the already populated nv50_ir_prog_info to forward information to the
   print pass (Pierre Moreau)
V4:
 - get rid of default value in PrintPass constructor

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-11-26 12:51:30 -05:00
Nicolai Hähnle 0fed7f83ba radeonsi: try flushing unflushed fences in si_fence_finish even when timeout == 0
Under certain conditions, waiting on a GL sync objects should act like
a flush, regardless of the timeout.

Portal 2, CS:GO, and presumably other Source engine games rely on this
behavior and hang during loading without this fix.

Fixes: bc65dcab3b ("radeonsi: avoid syncing the driver thread in si_fence_finish")

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904
2017-11-26 16:53:00 +01:00
Ilia Mirkin 0bd83d0461 nv50/ir: move LateAlgebraicOpt to the very end
Memory loads can take offsets, but the SHLADD will often attempt to
consume the offsets too. As there may be multiple memory loads with the
same base but different offsets, those would end up in a SHLADD instead
of the offset of the memory operation.

This moves the pass after we've had a chance to attempt to propagate
immediate adds into the indirect offset.

total instructions in shared programs : 6580681 -> 6567716 (-0.20%)
total gprs used in shared programs    : 944261 -> 943375 (-0.09%)
total shared used in shared programs  : 0 -> 0 (0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)
total bytes used in shared programs   : 60339896 -> 60221504 (-0.20%)

                local     shared        gpr       inst      bytes
    helped           0           0         555        2698        2698
      hurt           0           0         138         336         336

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-11-26 01:10:19 -05:00
Ilia Mirkin 3072bbef63 nv50/ir: when merging immediates/consts, load directly
When a MERGE operation gets its constraint moves added, it
susbstantially extends live ranges to be reusing an immediate from
earlier in the program (not to mention the silliness of loading an
immediate into a register, and then moving into another register).

We detect these scenarios and insert moves that take the immediate or
constbuf load directly into the register. If it's the last use, then we
can just move that operation to the closer location.

With SM35 (255 regs) we get these results:

total instructions in shared programs : 6583670 -> 6580681 (-0.05%)
total gprs used in shared programs    : 950818 -> 944261 (-0.69%)
total shared used in shared programs  : 0 -> 0 (0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)
total bytes used in shared programs   : 60367456 -> 60339896 (-0.05%)

                local     shared        gpr       inst      bytes
    helped           0           0        4584        3186        3186
      hurt           0           0          55         968         968

I suspect they will be better for SM20 and SM30.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-11-26 01:10:19 -05:00
Ilia Mirkin 50e913b9c5 nv50/ir: add optimization for modulo by a non-power-of-2 value
We can still use the optimized division methods which make use of
multiplication with overflow.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
2017-11-26 01:10:03 -05:00
Ilia Mirkin 3079993727 nv50/ir: optimize signed integer modulo by pow-of-2
It's common to use signed int modulo in GLSL. As it happens, the GLSL
specs allow the result to be undefined, but that seems fairly
surprising. It's not that much more effort to get it right, at least for
positive modulo operators.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-11-25 22:48:09 -05:00
Matt Turner 676761252b util: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC
MSVC doesn't support #warning?! Getting really tired of this.
2017-11-25 16:46:00 -08:00
Andres Gomez 5fa589148a docs: remove bug 103626 from fix list as per 17.2.6
Bug https://bugs.freedesktop.org/show_bug.cgi?id=103626 was
incorrectly listed as fixed.

Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit b9b60dbf55a1307a60a333c70c3add3643243c36)
2017-11-26 02:18:08 +02:00
Matt Turner b8cbad624b util: Use preprocessor correctly
Fixes: 6a353479a7 ("util: Assume little endian in the absence of
                      platform-specific handling")
2017-11-25 15:57:37 -08:00
Andres Gomez 63d488d10c docs: update calendar, add news item and link release notes for 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-11-26 01:46:25 +02:00
Andres Gomez b0049428b5 docs: add sha256 checksums for 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 93c2beafc0a7fa2f210b006d22aba61caa71f773)
2017-11-26 01:42:16 +02:00
Andres Gomez e6acc4d528 docs: add release notes for 17.2.6
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 00b52f8e99653316a090826914509a138a1c78f7)
2017-11-26 01:42:15 +02:00
Ilia Mirkin f39a91c152 freedreno/a4xx: add ARB_framebuffer_no_attachments support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2017-11-25 17:20:17 -05:00
Ilia Mirkin 4f748d12e8 freedreno/a4xx: add indirect draw support
This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2017-11-25 17:20:17 -05:00
Ilia Mirkin c3c8d48725 freedreno: regenerate pm4 header, adjust code for new names
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2017-11-25 17:20:17 -05:00
Ilia Mirkin ffdcd51e66 freedreno/a4xx: add stencil texturing support
Copied from a5xx, should be identical.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2017-11-25 17:20:17 -05:00
Ilia Mirkin 86f12e9377 freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2017-11-25 16:56:59 -05:00
Ilia Mirkin ab336e8b46 nir: allow texture offsets with cube maps
GL doesn't have this, but some hardware supports it. This is convenient
for lowering tg4 to plain texture calls, which is necessary on Adreno
A4xx hardware.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2017-11-25 16:56:30 -05:00
Matt Turner c690a7a8cd util: Fix disk_cache index calculation on big endian
The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.

The following program

        unsigned char array[4] = {1, 2, 3, 4};
        int *ptr = (int *)array;

        for (int i = 0; i < 4; i++) {
            printf("%02x", array[i]);
        }
        printf("\n");

        printf("%08x\n", *ptr);

prints

   01020304
   04030201

on little endian, and

   01020304
   01020304

on big endian.

On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.

To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.

Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes: 87ab26b2ab ("glsl: Add initial functions to implement an
                      on-disk cache")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-11-25 12:30:46 -08:00
Matt Turner 513d7ffa23 util: Add a SHA1 unit test program
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-11-25 12:30:46 -08:00
Matt Turner 532674303a util: Fix SHA1 implementation on big endian
The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.

Fixes: d1efa09d34 ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-11-25 12:30:46 -08:00
Matt Turner 6a353479a7 util: Assume little endian in the absence of platform-specific handling 2017-11-25 12:30:46 -08:00
Marek Olšák 78942e7dbf mesa: shrink VERT_ATTRIB bitfields to 32 bits
There are only 32 vertex attribs now.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-11-25 17:18:22 +01:00
Marek Olšák 43abaf2ad0 mesa: remove unused vertex attrib WEIGHT
We don't support ARB_vertex_blend.

Note that the attribute aliasing check for ARB_vertex_program had to be
rewritten.

vbo_context: 20344 -> 20008 bytes
gl_context: 74672 -> 74616 bytes

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-11-25 17:17:52 +01:00
Marek Olšák 2116b97418 mesa: don't assign numbers to vertex attrib enums manually
I plan to remove one of them.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-11-25 17:17:52 +01:00
Marek Olšák bd57f45168 gallium/hud: add HUD sharing within a context share group
This is needed for profiling multi-context applications like Chrome.
One context can record queries and another context can draw the HUD.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-25 17:16:56 +01:00
Marek Olšák 11e25eb7f4 gallium/hud: update the HUD interface for multiple contexts
This is the boring subset of the following commit.
All new parameters are optional.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-25 17:16:56 +01:00
Marek Olšák 9c5b4eb6b4 gallium/hud: prevent a crash if the recording context is inactive
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-25 17:16:56 +01:00
Marek Olšák 37ded08321 gallium/hud: separate code for record context init/release
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-25 17:16:56 +01:00
Marek Olšák fc07acc21e gallium/hud: separate code for draw context init/release
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-25 17:16:56 +01:00
Marek Olšák 8caf7d51a9 gallium/hud: don't use hud->pipe in hud_parse_env_var
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-25 17:16:56 +01:00
Marek Olšák 65433c3fd0 gallium/hud: use cso_get_pipe_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-25 17:16:56 +01:00