Commit Graph

111446 Commits

Author SHA1 Message Date
Pierre-Eric Pelloux-Prayer 28ce704bb0 mesa: factor out enum -> matrix stack lookup
Split this out from glMatrixMode since we're about to need it
independently for EXT_DSA.

Adapted from Chris Forbes commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:49 -04:00
Timothy Arceri b69584ad69 mesa: add new EXT_direct_state_access tokens
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:47 -04:00
Chris Forbes 028682f7f4 glapi: add EXT_direct_state_access
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:45 -04:00
Timothy Arceri 9c5d86af38 mesa: add a list of EXT_direct_state_access to dispatch sanity
This extension is huge and this gives us a TODO list of functions
to implement.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:33 -04:00
Pierre-Eric Pelloux-Prayer 4583f09caa radeonsi: init sctx->dma_copy before using it
Commit a1378639ab reordered context functions initializations but broke
sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma.

In this case sctx->dma_copy was assigned a value after being used in:
   sctx->b.resource_copy_region = sctx->dma_copy;

This commit moves the FORCE_DMA special case after sctx->dma_copy initialization.

See https://bugs.freedesktop.org/show_bug.cgi?id=110422

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:05:30 -04:00
Axel Davy 5820ac6756 d3dadapter9: Revert to old throttling limit value
Recently PIPE_CAP_MAX_FRAMES_IN_FLIGHT was changed from 2
to 1:
20909284f2

No driver seems to overwrite the default value.

One user reports severe regressions for some games.
For now, revert to the value 2 for nine.

Cc: "19.1" mesa-stable@lists.freedesktop.org

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
2019-06-03 20:37:13 +02:00
Marek Olšák 486bc1e17e ac: use amdgpu-flat-work-group-size
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-03 14:32:47 -04:00
Marek Olšák 4b11ed443b u_blitter: don't fail mipmap generation for depth formats containing stencil
Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109754

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-06-03 14:32:47 -04:00
Christian Gmeiner 3135ca4172 etnaviv: drop a bunch of duplicated gallium PIPE_CAP default code
Now that we have the util function for the default values, we can get
rid of the boilerplate.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-03 16:29:59 +02:00
Samuel Pitoiset 445098916a radv: flush pending query reset caches before copying results
From the Vulkan spec 1.1.108:
   "vkCmdCopyQueryPoolResults is guaranteed to see the effect of
    previous uses of vkCmdResetQueryPool in the same queue, without any
    additional synchronization."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-03 16:05:46 +02:00
Jonathan Marek 91672becc3 nir: copy intrinsic type when lowering load input/uniform and store output
Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-06-03 12:46:14 +00:00
Samuel Pitoiset 6970a9a6ca ac,radv: remove the vec3 restriction with LLVM 9+
This changes requires LLVM r356755.

32706 shaders in 16744 tests
Totals:
SGPRS: 1448848 -> 1455984 (0.49 %)
VGPRS: 1016684 -> 1016220 (-0.05 %)
Spilled SGPRs: 25871 -> 25815 (-0.22 %)
Spilled VGPRs: 122 -> 122 (0.00 %)
Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread
Code Size: 55324500 -> 55301152 (-0.04 %) bytes
Max Waves: 235660 -> 235586 (-0.03 %)

Totals from affected shaders:
SGPRS: 293704 -> 300840 (2.43 %)
VGPRS: 246716 -> 246252 (-0.19 %)
Spilled SGPRs: 159 -> 103 (-35.22 %)
Scratch size: 188 -> 180 (-4.26 %) dwords per thread
Code Size: 8653664 -> 8630316 (-0.27 %) bytes
Max Waves: 60811 -> 60737 (-0.12 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 11:30:08 +02:00
Caio Marcelo de Oliveira Filho 75590604a9 nir: Return nir_type_invalid for non-numeric base types
Now that the type gathering function look at instructions that might
have other types, return invalid type instead of crashing.  That
invalid will be properly ignored later.

Fixes: c12750527b "nir: add type information to load uniform/input and store output intrinsics"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 16:27:03 -07:00
Caio Marcelo de Oliveira Filho 27497c5c02 iris: Drop unused locals from iris_clear.c to avoid warning
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-05-31 15:55:05 -07:00
Jonathan Marek f387c2b238 nir: remove bool lowering from lower_int_to_float
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.

Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek f6579ee204 nir: fix lower_{int,bool}_to_float for new mov opcode
It is treated like the vecN instructions which also have no type.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek f889180ee1 nir: add lower_bitshift option
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek 887c2a6092 nir: fix gather_ssa_types
Consts and undefs can be used as different types (common with "0" constant)
so don't copy types from consts/undefs, only to them. It doesn't entirely
solve the problem that the type given to the const could be wrong , but
now the only realistic case is with "0" which is the same when casted to
float, so it doesn't matter for lower_int_to_float.

The other change is to get type information for load input/uniform and
store output, and use that to get correct results.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek c12750527b nir: add type information to load uniform/input and store output intrinsics
This type information will be used by gather_ssa_types to get usable results

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek 6016df211f nir: improvements to native_integers removal
Improvements related to the patch that removed native_integers:
 * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
 * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Rob Clark 32131a9568 freedreno/a6xx: add 'type' to shader state key
We could have identical texture state for both VS and FS.. which would
result in VS state getting created first, and FS state mapping to the
identical cmdstream.  Resulting in VS state getting emitted twice and no
FS state emitted.

Fixes:
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:47 -07:00
Rob Clark 8b7bf5e07a freedreno/ir3: fix constlen versus indirect UBO
If we access the address of the UBO indirectly, and there is no higher
const emitted w/ direct access (like an immediate lowered to uniform)
the assembler won't figure out the correct constlen.

Fixes:
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-31 12:58:33 -07:00
Rob Clark 8eaa2d5021 freedreno/a6xx: fix GPU crash on small render targets
Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Rob Clark f9fa456e1d freedreno/ir3: set more barrier bits
Blob is also setting the .l bit, and it seems to solve some intermittent
failures with a couple of deqp's:

dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i
dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Rob Clark 5d43b806ba freedreno/ir3: set (ss) on last_input if ldlv
It seems like (ei) handling doesn't sync on (ss), so we could end up in
a situation where we release varying storage before an ldlv for flat
shaded varyings completes.  Keep track if we've done an (ss) since the
last ldlv, and if not add (ss) flag to last_input which gets (ei).

Noticed with dEQP-GLES3.functional.fragment_out.random.24 and
dEQP-GLES3.functional.fragment_out.random.27, which previously passed by
luck because ir3_sched ordered instructions in a way that resulted in a
lucky (ss).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Rob Clark 73fb02c5d6 freedreno/ir3: add assert
The special handling for last_input assumes that all the varying loads
are in the first block.  Add an assert to catch if anyone breaks that
assumption.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Connor Abbott 8c74772edc util/hash_table: Use fast modulo computation
While we're here, copy the size table from set.c to get rid of hard tabs
in the hash_table.c version.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:35 +02:00
Connor Abbott 83667f7a61 util/set: Use fast modulo computation
Compilation times with my shader-db database:

Difference at 95.0% confidence
	-1.22312 +/- 0.726033
	-0.283979% +/- 0.168254%
	(Student's t, pooled s = 1.02177)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:30 +02:00
Connor Abbott b87817871b util: Add a helper for faster remainders
This should be at least as fast as using fast_idiv_by_const, and has the
advantage that the precomputation is simple enough to be evaluated at
Mesa-compile time for hash tables and sets which have a fixed table of
possible divisors.

Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:27 +02:00
Connor Abbott 983b001c77 util/hash_table: Add specialized resizing add function
To keep it in sync with the set implementation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:22 +02:00
Connor Abbott 6f9beb28bb util/set: Add specialized resizing add function
A significant portion of the time spent in nir_opt_cse for the Dolphin
ubershaders was in resizing the set. When resizing a hash table, we know
in advance that each new element to be inserted will be different from
every other element, so we don't have to compare them, and there will be
no tombstone elements, so we don't have to worry about caching the
first-seen tombstone. We add a specialized add function which skips
these steps entirely, speeding up resizing.

Compile-time results from my shader-db database:

Difference at 95.0% confidence
	-2.29143 +/- 0.845534
	-0.529475% +/- 0.194767%
	(Student's t, pooled s = 1.08807)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:16 +02:00
Connor Abbott 451211741c util/hash_table: Pull out loop-invariant computations
To keep the set and hash table in sync. Note that some of this had
already been done for hash tables, in particular pulling out the
hash % ht->size computation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:09 +02:00
Connor Abbott f7ff685649 util/set: Pull out loop-invariant computations
Unfortunately GCC can't do this for us, probably because we call the key
comparison function which GCC can't prove won't modify arbitrary memory.
This is a pretty hot function, so do the optimization manually to be
sure the compiler will get it right.

While we're here, make the computation of the new probe address use a
single conditional subtract instead of a modulo, since we know that it
won't ever get as big as 2 * ht->size before the modulo. Modulos tend to
be pretty expensive operations.

shader-db compile time results for my database:

Difference at 95.0% confidence
	-2.24934 +/- 0.69897
	-0.516296% +/- 0.159993%
	(Student's t, pooled s = 0.983684)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:04 +02:00
Connor Abbott 3bd0733011 nir/instr_set: Use _mesa_set_search_or_add()
Before this change, we were searching for each instruction twice, once
when checking if it exists and once when figuring out where to insert
it. By using the new function, we can do everything we need to do in one
operation.

Compilation time numbers for my shader-db database:

Difference at 95.0% confidence
	-4.04706 +/- 0.669508
	-0.922142% +/- 0.151948%
	(Student's t, pooled s = 0.95824)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:13:59 +02:00
Connor Abbott 8a838e172f util/set: Add a _mesa_set_search_or_add() function
Unlike _mesa_set_search_and_add(), it doesn't replace an entry if it's
found, returning it instead. This is useful for nir_instr_set, where
we have to know both the original original instruction and its
equivalent.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:13:45 +02:00
Jonathan Marek 1db86d8b62 freedreno/ir3: fix input ncomp for vertex shaders
ncomp is never set for vertex shaders, but a3xx and a4xx still use it.

Fixes: 831f1a05c0 freedreno/ir3: rework varying packing

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-05-31 12:21:23 -04:00
Ian Romanick 65df6122da intel/compiler: Use compare rematerialization pass
Almost all of the spill / fill benefit is in Deus Ex.

Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17196395 (-0.16%)
instructions in affected programs: 1518658 -> 1490615 (-1.85%)
helped: 1550
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2
helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45%
HURT stats (abs)   min: 5 max: 10 x̄: 6.67 x̃: 5
HURT stats (rel)   min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32%
95% mean confidence interval for instructions value: -19.86 -16.26
95% mean confidence interval for instructions %-change: -1.19% -1.04%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361288721 (-0.05%)
cycles in affected programs: 197367688 -> 197187954 (-0.09%)
helped: 990
HURT: 683
helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16
helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26%
HURT stats (abs)   min: 1 max: 12190 x̄: 905.14 x̃: 22
HURT stats (rel)   min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47%
95% mean confidence interval for cycles value: -315.45 100.58
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12147 -> 8948 (-26.34%)
spills in affected programs: 5433 -> 2234 (-58.88%)
helped: 343
HURT: 0

total fills in shared programs: 25262 -> 21814 (-13.65%)
fills in affected programs: 7771 -> 4323 (-44.37%)
helped: 343
HURT: 3

LOST:   0
GAINED: 17

Ivy Bridge
total instructions in shared programs: 12083517 -> 12081427 (-0.02%)
instructions in affected programs: 540744 -> 538654 (-0.39%)
helped: 786
HURT: 29
helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31%
95% mean confidence interval for instructions value: -2.83 -2.30
95% mean confidence interval for instructions %-change: -0.57% -0.47%
Instructions are helped.

total cycles in shared programs: 180153463 -> 180124798 (-0.02%)
cycles in affected programs: 72597920 -> 72569255 (-0.04%)
helped: 572
HURT: 249
helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13
helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26%
HURT stats (abs)   min: 1 max: 11060 x̄: 136.37 x̃: 10
HURT stats (rel)   min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32%
95% mean confidence interval for cycles value: -96.22 26.39
95% mean confidence interval for cycles %-change: -0.43% -0.23%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3625 -> 3623 (-0.06%)
spills in affected programs: 46 -> 44 (-4.35%)
helped: 1
HURT: 0

total fills in shared programs: 4065 -> 4061 (-0.10%)
fills in affected programs: 104 -> 100 (-3.85%)
helped: 1
HURT: 0

LOST:   0
GAINED: 8

Sandy Bridge
total instructions in shared programs: 10879656 -> 10878699 (<.01%)
instructions in affected programs: 275167 -> 274210 (-0.35%)
helped: 544
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25%
95% mean confidence interval for instructions value: -1.97 -1.55
95% mean confidence interval for instructions %-change: -0.43% -0.36%
Instructions are helped.

total cycles in shared programs: 154089096 -> 154081132 (<.01%)
cycles in affected programs: 4422722 -> 4414758 (-0.18%)
helped: 459
HURT: 214
helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8
helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14%
HURT stats (abs)   min: 1 max: 226 x̄: 19.99 x̃: 4
HURT stats (rel)   min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09%
95% mean confidence interval for cycles value: -15.51 -8.15
95% mean confidence interval for cycles %-change: -0.31% -0.17%
Cycles are helped.

total spills in shared programs: 2880 -> 2876 (-0.14%)
spills in affected programs: 636 -> 632 (-0.63%)
helped: 2
HURT: 0

total fills in shared programs: 3161 -> 3157 (-0.13%)
fills in affected programs: 1519 -> 1515 (-0.26%)
helped: 2
HURT: 0

LOST:   0
GAINED: 2

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8157361 -> 8155067 (-0.03%)
instructions in affected programs: 382491 -> 380197 (-0.60%)
helped: 677
HURT: 0
helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2
helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42%
95% mean confidence interval for instructions value: -3.76 -3.01
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 188588292 -> 188583040 (<.01%)
cycles in affected programs: 3155064 -> 3149812 (-0.17%)
helped: 377
HURT: 13
helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6
helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12%
HURT stats (abs)   min: 2 max: 8 x̄: 5.85 x̃: 6
HURT stats (rel)   min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04%
95% mean confidence interval for cycles value: -15.67 -11.27
95% mean confidence interval for cycles %-change: -0.45% -0.30%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Ian Romanick 3ee2e84c60 nir: Rematerialize compare instructions
On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag.  This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction.  This can save register pressure by
not needing to store the Boolean value.

There are several possible ares for future improvement to this pass:

1. Be more conservative.  If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare.  The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.

2. Be less conservative.  Currently the pass requires that all users of
the comparison match the pattern.  The idea is that after the pass is
complete, no instruction will use the resulting Boolean value.  The only
uses will be of the flag value.  It may be beneficial to relax this
requirement in some cases.

3. Be less conservative.  Also try to rematerialize comparisons used for
discard_if intrinsics.  After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already.  The changes were pretty small.  Instructions were helped in 19
shaders, but, overall, cycles were hurt.  A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.

4. Copy the preceeding ALU instruction.  If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction.  On Intel GPUs, this may enable
cmod propagation to make additional progress.

v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Ian Romanick 336eab0630 nir: Add a shallow clone function for nir_alu_instr
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Tomeu Vizoso 0e1c5cc78f panfrost: Remove link stage for jobs
And instead, link them as they are added.

Makes things a bit clearer and prepares future work such as FB reload
jobs.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-31 14:37:10 +02:00
Tomeu Vizoso da9f7ab6d4 panfrost: ci: Switch to kernel 5.2-rc2
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-31 13:51:51 +02:00
Tomeu Vizoso 77f5663cf3 panfrost: ci: Update expectations
A bunch of tests have been fixed, but some regressions have appeared on
T760.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-05-31 13:51:43 +02:00
Connor Abbott 78f33620e8 radeonsi/nir: Remove hack for builtins
We now bounds check properly in the uniform loading fast path, so
there's no need to disable it by pretending there are other UBO bindings
in use. The way this looks at the variable name was causing problems
when two piglit shaders, one with a name that triggered the hack and one
that didn't, got hashed to the same thing after stripping out the names.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:03:05 +02:00
Connor Abbott fca1a35163 radeonsi/nir: Use correct location for uniform access bound
location is the API-level location, but driver_location is the actual
location the uniform gets passed to the driver. This apparently only
caused failures with builtins, where the location is 0 because it's
represented via the state tokens instead.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:02:57 +02:00
Connor Abbott 6571032af1 radeonsi/nir: Correctly handle double TCS/TES varyings
ac expands the store to 32-bit components for us, but we still have to
deal with storing up to 8 components, and when a varying is split across
two vec4 slots we have to calculate the address again for the second
slot, since they aren't adjacent in memory. I didn't do this on the ac
level because we should generate better indexing arithmetic for the lds
store, where slots are contiguous.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:02:11 +02:00
Christian Gmeiner ca19f7639a etnaviv: blt: s/TRUE/true && s/FALSE/false
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-05-31 10:04:49 +02:00
Christian Gmeiner 9e6463e62a etnaviv: rs: s/TRUE/true && s/FALSE/false
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-05-31 10:04:49 +02:00
Bas Nieuwenhuizen e24a7840f6 nir: Actually propagate progress in nir_opt_move_load_ubo.
Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950).

Fixes: af355aaa07 "nir: add nir_opt_move_load_ubo() optimization pass"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 07:45:43 +00:00
Samuel Pitoiset 9178076a46 radv: use RADV_CMD_DIRTY_DYNAMIC_* when restoring viewport/scissor
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-31 08:50:16 +02:00
Samuel Pitoiset 0e7b029d00 radv: use CmdPushConstants when restoring constants after meta operations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-31 08:50:13 +02:00