Commit Graph

470 Commits

Author SHA1 Message Date
Jason Ekstrand c9a4793de8 v3d: Use the correct opcodes for signed image min/max
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21 17:19:55 +00:00
Jason Ekstrand 951cf94521 nir: Add explicit signs to image min/max intrinsics
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode.  Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle.  In SPIR-V, signed min/max are separate
opcodes from unsigned.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21 17:19:55 +00:00
Iago Toral Quiroga 3539bd63dd v3d: clamp gl_PointSize to a minimum of 1.0
The OpenGL ES spec requires that the value of gl_PointSize is clamped
to an implementation-dependent range matching what is advertised by
GL_ALIASED_POINT_SIZE_RANGE. For V3D this is [1.0, 512.0], but the
hardware won't clamp to the minimum side of the range and won't render
points with a size strictly smaller than 1.0 either, so we need to
clamp manually. For points larger than the maximum size of the range
the hardware clamps automatically.

Fixes piglit test:
spec/!opengl 2.0/vs-point_size-zero

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 09:44:54 +02:00
Iago Toral Quiroga 62e0ca3064 v3d: line length style fixes
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:38:19 +02:00
Iago Toral Quiroga 99e9809cab v3d: honor the write mask on store operations
v2:
  - Fix incremental update of the const offset when we need to emit a sequence
    with more than one write because of the writemask.
  - Do not move the tmu write emission to a separate helper.

v3:
  - Get the store writemask before the loop, use ffs to get the first component
    to write and clear writemask bits as we process the components (Eric).
  - Simplified the code that figured out the number of components for the TMU
    config based on the number of tmu writes for stores and atomics.

v4:
  - Code clean-ups (Eric).

Fixes:
KHR-GLES31.core.shader_image_load_store.advanced-cast-cs
KHR-GLES31.core.shader_image_load_store.advanced-cast-fs
KHR-GLES31.core.shader_storage_buffer_object.advanced-switchBuffers-cs
KHR-GLES31.core.shader_storage_buffer_object.advanced-switchPrograms-cs
KHR-GLES31.core.shader_storage_buffer_object.basic-operations-case1-cs

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:38:19 +02:00
Iago Toral Quiroga 3d65d2a488 v3d: refactor ntq_emit_tmu_general() slightly
When we implement write masks on store operations we might need to
emit multiple write sequences for a given store intrinsic. To make
that easier, let's split the emission of the tmud instructions to
their own block after we are done with the code that only needs to
run once no matter how many write sequences we need to emit.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:38:19 +02:00
Rhys Perry 7740149852 nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Iago Toral Quiroga 0f2d1dfe65 v3d: use the GPU to record primitives written to transform feedback
We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive
counts to a buffer, including the number of primives written to transform
feedback buffers, which will handle buffer overflow correctly.

There are a couple of caveats with this:

Primitive counters are reset when we emit a 'Tile Binning Mode Configuration'
packet, which can happen in the middle of a primitives query, so we need to
read the buffer when we submit a job and accumulate the counts in the context
so we don't lose them.

We also need to do the same when we switch primitive type during transform
feedback so we can compute the correct number of recorded vertices from
the number of primitives. This is necessary so we can provide an accurate
vertex count for draw from transform feedback.

v2:
 - When computing the number of vertices for a primitive, pass in the base
   primitive, since that is what the hardware will count.
 - No need to update primitive counts when switching primitive types if
   the base primitives are the same.
 - Log perf warning when mapping the primitive counts BO for readback (Eric).
 - Only emit the primitive counts packet once at job end (Eric).
 - Use u_upload mechanism for the primitive counts buffer (Eric).
 - Use the XML to generate indices into the primitive counters buffer (Eric).

Fixes piglit tests:
spec/ext_transform_feedback/overflow-edge-cases
spec/ext_transform_feedback/query-primitives_written-bufferrange
spec/ext_transform_feedback/query-primitives_written-bufferrange-discard
spec/ext_transform_feedback/change-size base-shrink
spec/ext_transform_feedback/change-size base-grow
spec/ext_transform_feedback/change-size offset-shrink
spec/ext_transform_feedback/change-size offset-grow
spec/ext_transform_feedback/change-size range-shrink
spec/ext_transform_feedback/change-size range-grow
spec/ext_transform_feedback/intervening-read prims-written

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08 08:36:52 +02:00
Iago Toral Quiroga 5ffb8b1716 v3d: add header guards in v3d_packet_helpers.h
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08 08:36:52 +02:00
Eric Engestrom d2d85b950d meson: replace libmesa_util with idep_mesautil
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.

Next commit will remove the ones that were only added for that reason.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Eric Engestrom abc226cf41 tree-wide: replace MAYBE_UNUSED with ASSERTED
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Anholt 82bf1979d7 v3d: Introduce a DRM shim for calling out to the simulator.
The goal is to enable testing of parts of drivers without depending on any
particular kernel version or hardware being present.

Simply set LD_PRELOAD=$PREFIX/lib/libv3d_drm_shim.so in your environment,
and we'll fake a /dev/dri/renderD128 (or whatever the next available node
is) using v3dv3.  That node can then be used with the surfaceless or gbm
EGL platforms.

Acked-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-25 08:56:19 -07:00
Jose Maria Casanova Crespo 9bf0bdf776 v3d: Avoid scheduling an instruction that stalls waiting for SFU retval
If we detect that a scheduling candidate will stall because having a
register source that is the written by the SFU unit in the previous
instruction we reduce its priority so any non stalling operation would
be chosen.

The latency of SFU operations is defined as 2. So they would be scheduled
earlier if other candidates have the same priority.

Finally we won't merge instructions that stall to a previously chosen one.
As the result of the previous one would be waiting for an extra cycle.

Although shader-db result show that instruction are hurt with an increase
of 0.35% the sum of instructions + stalls is reduced a 0.52%. And
the total of sfu-stalls is reduced a 63.51%. It implies also a small
increase in the max-temps metric because of scheduling earlier SFU
operations.

total instructions in shared programs: 9102719 -> 9117851 (0.17%)
instructions in affected programs: 4324628 -> 4339760 (0.35%)
helped: 4162
HURT: 12128
helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1
helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51%
HURT stats (abs)   min: 1 max: 27 x̄: 1.69 x̃: 1
HURT stats (rel)   min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68%
95% mean confidence interval for instructions value: 0.90 0.96
95% mean confidence interval for instructions %-change: 0.47% 0.50%
Instructions are HURT.

total max-temps in shared programs: 1327728 -> 1327812 (<.01%)
max-temps in affected programs: 4730 -> 4814 (1.78%)
helped: 61
HURT: 134
helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1
helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17%
HURT stats (abs)   min: 1 max: 3 x̄: 1.12 x̃: 1
HURT stats (rel)   min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26%
95% mean confidence interval for max-temps value: 0.28 0.58
95% mean confidence interval for max-temps %-change: 1.80% 3.52%
Max-temps are HURT.

total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%)
sfu-stalls in affected programs: 95029 -> 31802 (-66.53%)
helped: 25882
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2
helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00%
95% mean confidence interval for sfu-stalls value: -2.47 -2.42
95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54%
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%)
inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%)
helped: 22728
HURT: 855
helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1
helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92%
HURT stats (abs)   min: 1 max: 5 x̄: 1.25 x̃: 1
HURT stats (rel)   min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86%
95% mean confidence interval for inst-and-stalls value: -2.07 -2.01
95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05%
Inst-and-stalls are helped.

v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 03:00:50 +02:00
Jose Maria Casanova Crespo c341ab7ffb v3d: add shader-db stat to count SFU stalls
SFU operations have a latency of 2 cicles, so if their results
are used in the following cycle to a SFU instruction, the GPU
stalls for an extra cycle until the result is available.

This adds the number of stalls to the shader-db debug mode and
sum of instruction + stalls to evaluate optimizations to schedule
instructions that avoid generating sfu-stalls.

v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 03:00:50 +02:00
Eric Anholt cdc359c58e v3d: Use nir_shader_lower_instructions() for txf_ms lowering.
Cuts out a bunch of boilerplate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Eric Anholt 40e7609603 v3d: Fix assertion failures in debug builds.
nir_lower_io leaves around deref_var instructions after lowering away
deref intrinsics.  This ends up breaking validation after v3d_nir_lower_io
removes variables not actually being stored by the shader's
store_output()s.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Iago Toral Quiroga c23fa1ca07 v3d: emit correct lowering for logic operations with MSAA render targets
v2:
 - Drop the writemask from the per-sample color intrinsic (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga 93d05c1c1f v3d: handle nir_intrinsic_store_tlb_sample_color_v3d
v2:
 - Move handling of output intrinsics to ntq_emit_intrinsic() (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga ba520b00c4 v3d: implement per-sample tlb color writes
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga b96c2219ca v3d: refactor the tlb color write code
We want to split the tlb specifier setup from the color writes, because when
we implement per-sample color writes we want to do the latter for all the
samples, but the former only once.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga fd3ec6f55d v3d: move tlb color write emission to a helper function
We will soon be adding per-sample color writes which means additional
complexity and more indentation (we will need another loop to emit
the writes for each individual sample), so this will help keeping
things simple and a bit more readable.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga 0c9919710e v3d: implement per-sample tlb color reads
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Andreas Bergmeier f92290a8d9 broadcom: Move v3d_get_device_info to common
In common we can use implementation for Vulkan.
2019-07-17 20:02:34 +00:00
Alejandro Piñeiro 85b78f96a6 v3d: use inc/dec tmu operation with image atomic sub/add of 1
This allows to remove a mov of 1/-1, as it is implicit with the
operation.

As with atomic inc/dec/add, usual shader-db set doesn't include any
GLES shader using it. So using as workaround vk-gl-cts shaders, we get
this:

total instructions in shared programs: 1217013 -> 1217006 (<.01%)
instructions in affected programs: 53 -> 46 (-13.21%)
helped: 2
HURT: 0

One of the helped shader went from 40 to 34 instructions.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:51:22 +02:00
Alejandro Piñeiro 2e22879115 v3d: refactor some code from v3d40_vir_emit_image_load_store
And moved to new auxiliar method v3d40_image_load_store_tmu_op,
equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:49:29 +02:00
Alejandro Piñeiro 934ce48db8 v3d: use inc/dec tmu operation with atomic sub/add of 1
Among other things, this avoid the need of loading 1/-1 constants (so
one less operation).

The removed comment suggest the option of adding support on NIR for
inc/dec. Intel just uses an auxiliar method to get which hw operation
is needed, so no lowering is needed. And at the same time, being so
small, seems unreasonable to try to add a general one on NIR
itself. It is more easy to just adapt the method here (that is what
the patch does right now).

It is worth to note that we are not getting any change on shader-db
stats because all those methods are used on the usual shader-db set
with shaders needing GLSL > 4.2. In general there aren't too many GLSL
ES 3.1 tests.

As an alternative, we captured the GLES3/GLSL31/GLS32 used on
vk-gl-cts, even if that is not a real life usage of shaders. With
those we get the following:

total instructions in shared programs: 1217022 -> 1217013 (<.01%)
instructions in affected programs: 117 -> 108 (-7.69%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1
helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09%
95% mean confidence interval for instructions value: -2.07 -0.93
95% mean confidence interval for instructions %-change: -10.54% -5.64%
Instructions are helped.

Note that the shaders helped are really low because most of the
vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute
shaders. Although right now there is a branch around with CS support,
the usual is doing the stats against master.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:40 +02:00
Alejandro Piñeiro 3912a32a79 v3d: remove redefinition of tmu operations on nir_to_vir
They are already defined, although is a slightly different format on
the generated packet headers, so it was needed to change how it is
used on nir_to_vir.

In addition to allow to remove some duplicated headers, it will allow
to define just one get_op_for_atomic_add aux method later to support
using inc/dec instead of add of 1/-1.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:17 +02:00
Alejandro Piñeiro c2ff38d2df v3d: tweak initial comment on pack generator script
As the files it mentions to use as reference has slightly different
names.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:09 +02:00
Iago Toral Quiroga 10d50f2904 v3d: remove unused definitions
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 8e50a9f6cf v3d: move implementation of some intrinsics to separate helpers
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga d69184204e v3d: emit correct lowering for logic ops with RGB10A2 render targets
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 7bf3676845 v3d: emit correct lowering for logic ops with integer render targets
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga e540775f0c v3d: add lowering for OpenGL logic operations
This implements support for OpenGL logic operations by emitting code to read
from the TLB if needed and blending the fragment output accordingly. It is
similar to VC4's blend lowering pass, but exclusive to logic operations, since
blending is otherwise supported in hardware.

The pass doesn't handle MSAA targets yet.

Fixes the following piglit tests:
spec/!opengl 1.0/gl-1.0-logicop/*
spec/!opengl 1.1/gl-1.1-xor
spec/!opengl 1.1/gl-1.1-xor-copypixels

It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which
is rendered via glamor using the XOR logic operation.

v2: fix checks for allowed variable location and maximum render target (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 7c1d708911 v3d: acquire scoreboard lock before first tlb read
Until now we have always been emitting our scoreboard locks on the last thread
switch to improve parallelism. We did this by emitting our last thread switch
right before our tlb writes at the very end of the program, where we know that
we are outside control flow.

Unfortunately, this strategy is not valid when we have tlb color reads too, as
these will happen before this point in the program and can happen inside
control flow.

To fix this we always emit a thread switch before the first tlb load and if we
see additional thread switches after that point, we change the strategy to lock
on the first thread switch.

v2: change the solution so it is expected to work in more scenarios (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 47d7c80dc7 v3d: implement tile buffer color read intrinsic
We will be emitting this intrinsic to signal TLB color loads when we implement
OpenGL logic operations, where we need to blend the fragment shader color
output with the existing color in the render target.

Per-sample TLB reads are not supported yet.

v2: fix the offset into the color_reads array (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 6af1bdefa9 v3d: fix size of color_reads and sample_colors arrays
We need to scale the size of these arrays to consider up to
V3D_MAX_DRAW_BUFFERS render targets and 4 components per color.

v2: we want to store each color component separately, so scale by 4 too.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 0279ac6e51 v3d: add color formats and swizzles to the fragment shader key
We are going to need these very soon to emit correct reads from the tlb
to implement logic operations.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga d26b35ba44 v3d: add helpers to emit ldtlb and ldtlbu signals
The ldtlbu version will read an implicit uniform with the TLB read
specifier and should be used for the first read in a sequence
of TLB reads (unless the default configuration is valid, in which
case we can use ldtlb). The ldtlb version is used for any subsequent
TLB read in the sequence.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga aff8885cf9 v3d: handle tlb read dependency tracking as if they were writes
Tile buffer reads are emitted as ordered sequences and cannot be reordered.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 4793e2c888 v3d: instructions with the ldtlb and ldtlbu signals are tlb instructions
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 83a66e10de v3d: tlb loads cannot be removed
Loads from the tile buffer are emitted in ordered sequences so
we cannot eliminate or reorder any of them.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 08f4dc3adc v3d: the ldtlbu signal reads an implicit uniform
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga 271bc8acfb v3d: handle ldtlb and ldtlbu signals during disassembly
We already have code to print these signals but the early return in the code
that checks if any signals are present present was missing the checks for them,
so it would skip printing them unless they were paired with other signals.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Sagar Ghuge 456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Daniel Schürmann 165b7f3a44 nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.

Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Iago Toral Quiroga 79a30543ee v3d: implement simultaneous peripheral access exceptions for V3D 4.1+
Shader-db results:

total instructions in shared programs: 9117550 -> 9102719 (-0.16%)
instructions in affected programs: 1752873 -> 1738042 (-0.85%)
helped: 7076
HURT: 478
helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2
helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07%
HURT stats (abs)   min: 1 max: 7 x̄: 1.41 x̃: 1
HURT stats (rel)   min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54%
95% mean confidence interval for instructions value: -2.00 -1.92
95% mean confidence interval for instructions %-change: -1.58% -1.50%
Instructions are helped.

total max-temps in shared programs: 1327774 -> 1327728 (<.01%)
max-temps in affected programs: 1025 -> 979 (-4.49%)
helped: 47
HURT: 2
helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17%
95% mean confidence interval for max-temps value: -1.06 -0.82
95% mean confidence interval for max-temps %-change: -8.89% -5.49%
Max-temps are helped.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-18 08:09:03 +02:00
Iago Toral Quiroga 360b832c58 v3d: do not setup execute flags for else block in uniform control flow
Either all channels executed the 'then' block, in which case all
channels will directly jump to the 'endif' block at the end of the
'then' block, or all channels execute the 'else' block (so no
execution masking is necessary).

Shader-db results:

total instructions in shared programs: 9119238 -> 9117550 (-0.02%)
instructions in affected programs: 401252 -> 399564 (-0.42%)
helped: 855
HURT: 77

total uniforms in shared programs: 3022622 -> 3022605 (<.01%)
uniforms in affected programs: 3566 -> 3549 (-0.48%)
helped: 17
HURT: 0

total max-temps in shared programs: 1327762 -> 1327774 (<.01%)
max-temps in affected programs: 619 -> 631 (1.94%)
helped: 2
HURT: 15

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 08:00:52 +02:00
Alejandro Piñeiro 17c2c9cd67 v3d: fix checking twice auf flag
Seems a C&P error, and should check for auf/muf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110902
Fixes: 8f065596d2 "v3d: Add an optimization pass for redundant flags updates."

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-13 11:45:18 +02:00
Iago Toral Quiroga 9b96ae69bc v3d: don't emit point coordinates varyings if the FS doesn't read them
We still need to emit them in V3D 3.x since there there is no mechanism to
disable them.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:29:42 +02:00
Iago Toral Quiroga 5e26e55e72 v3d: add a helper to track variables that need point coordinates
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:26:52 +02:00