Reduces vertex shader instruction counts in DOTA2 by 6.42%, L4D2 by
4.61%, and CS:GO by 5.71%.
total instructions in shared programs: 1500153 -> 1498191 (-0.13%)
instructions in affected programs: 59919 -> 57957 (-3.27%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Only implemented for ir_swizzles currently, but perhaps will be useful
for other IR types in the future.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This flag was really just a proxy for determining whether the backend
was vector (AOS) or scalar (SOA). It will be used to apply a future
optimization only for vector backends.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Dumping the number of live registers at each IP allows us to see
register pressure and identify any local maxima. This should
aid in debugging passes designed to reduce register pressure, as
well as optimizations that suddenly trigger spilling.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Calling it after value numbering (added in the next commit) prevents
some instruction count regressions.
total instructions in shared programs: 1524387 -> 1523905 (-0.03%)
instructions in affected programs: 13112 -> 12630 (-3.68%)
GAINED: 0
LOST: 3
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously we simply considered two registers whose live ranges
overlapped to interfere. Cases such as
set A ------
... |
mov B, A -- |
... | B | A
use B -- |
... |
use A ------
would be considered to interfere, even though B is an unmodified copy of
A whose live range fit wholly inside that of A.
If no writes to A or B occur between the mov B, A and the use of B then
we can safely coalesce them.
Instead of removing MOV instructions, we make them NOPs and remove them
at once after the main pass is finished in order to avoid recomputing
live intervals (which are needed to perform the previous step).
total instructions in shared programs: 1543768 -> 1513077 (-1.99%)
instructions in affected programs: 951563 -> 920872 (-3.23%)
GAINED: 46
LOST: 22
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
mov takes only a single source argument. Example instruction
inexplicably changed from add to mov in commit f10f5e49.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Unnamed record types are assigned to separate types per stage, e.g. if
uniform struct { ... } a;
is defined in both vertex and fragment shader, two separate types will
result with different names. When linking the shader, this results in a
type conflict. However, there is no reason why this should not be
allowed according to GLSL specifications. Compare and match record types
when linking shader stages to avoid this conflict.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes several colorbuffer tests, including piglit "fbo-drawbuffers-none"
for "gl_FragColor" and "glDrawPixels" cases.
v2: rework patch to only avoid creating extra shader variants when
TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS is not specified. Per Jose.
Use a write_color0_to_n_cbufs key field to replicate color0 to N
color buffers only when N > 0 and WRITES_ALL_CBUFS is set.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The new TYPE_DOUBLEN_2 type was added in 0e60d850 but the code to
return values of that type wasn't completed.
Fixes conform's default state test. glGetFloatv(GL_DEPTH_RANGE)
wasn't returning anything.
v2: remove stray 'break' statements.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These messages are in code that is shared between the VS and GS
back-ends, so use the terminology "vec4" to avoid confusion.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even with depth clipping disabled, vertices which have negative w coords
must be discarded. And since we don't have a proper guardband implementation
yet (relying on driver to handle all values except infs/nans in rasterization
for such points) we need to kill them off manually (as they can end up with
coordinates inside viewport otherwise).
v2: use 0.0f instead of 0 (spotted by Brian).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Also increment ir->offset in the GS visitor, rather than at the
final assembly generation stage (requested by Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
On Broadwell, PIPE_CONTROL needs an extra DWord to accomodate the
48-bit addressing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we have a helper function that handles the PIPE_CONTROL
variations between the various platforms, these are basically the same.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are a lot of places that use PIPE_CONTROL to write a value to a
buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT).
Creating a single function to do this seems convenient.
As part of this refactor, we now set the PPGTT/GTT selection bit
correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms.
This is correct for Sandybridge, but actually part of the address on
Ivybridge and later!
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to adjust that in substantially fewer
places, giving us confidence that we've hit them all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I believe that PIPE_CONTROL uses the length field to decide whether to
do 32-bit or 64-bit writes. A length of 4 would do a 32-bit write,
while a length of 5 would do a 64-bit write. (I haven't verified this,
though.)
For workaround writes, we don't care what value gets written, or how
much data. We're only writing something because hardware bugs mandate
that do so. So using a 64-bit write should be fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The PIPE_CONTROL packet actually has 5 DWords on Gen6+:
1. Header
2. Flags
3. Address
4. Immediate Data: Lower DWord
5. Immediate Data: Upper DWord
We just never emitted the last one. While it appears to work, it's
probably safer to emit the entire thing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These days, we need to emit PIPE_CONTROL flushes all over the place.
Being able to do that via a single function call seems convenient.
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to do this in substantially fewer places.
v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by
Eric Anholt). Drop unlikely() from BLT_RING check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell uses 48-bit addresses. The first DWord is the low 32 bits,
and the second DWord is the high 16 bits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to
replace the old 'unsigned long offset' field. To preserve ABI, libdrm
continues to store the presumed offset in both locations.
On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses.
However, with a 32-bit userspace, the 'unsigned long offset' field will
only be 32-bit, which is not large enough to hold this value. We need
to use a proper uint64_t (like the kernel does).
Technically, a lot of this code doesn't affect Broadwell, so we could
leave it using the old field. But it makes sense to just switch to the
new, properly typed field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nothing in i965 uses it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
intel_buffer_objects.c: In function 'old_intel_bufferobj_buffer':
intel_buffer_objects.c:471:17: warning: unused parameter 'flag' [-Wunused-parameter]
The parameter hasn't been used since the i915 and i965 drivers had their
breakup. i965 got the flags, and i915 got to cry itself to sleep.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Not actually tested, but the changes are identical to the i965 changes
that are tested.
v2: Remove MAX2(64, ...). Suggested by Ken (in the i965 version of this
patch).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Siavash Eliasi <siavashserver@gmail.com>
No piglit regressions on IVB.
With minor tweaks to the arb_map_buffer_alignment-map-invalidate-range
test (disable the extension check, set alignment to 64 instead of
querying), the i965 driver would fail the test without this patch (as
predicted by Eric). With this patch, it passes.
v2: Remove MAX2(64, ...). Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Siavash Eliasi <siavashserver@gmail.com>
v2 (idr): Only enable the extension on GEN7+ w/core profile because it
requires geometry shaders.
v3 (idr): Add some casting to fix setting of ViewportBounds.Min.
Negating an unsigned value, then casting to float doesn't do what you
might think it does.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
noop_scissor (correctly) only examines the scissor rectangle for
viewport 0. Therefore, it should only be called when that scissor
rectangle is enabled.
v2: Remove spurious change to radeon code. Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently MaxViewports is still 1, so this won't affect any change.
v2: Minor code reformatting suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers that currently use _Xmin and friends to set their scissor
rectangle will need to use this code directly once they are updated for
GL_ARB_viewport_array.
v2: Use different bit-test idiom and fix mixed tabs and spaces. Both
were suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently MaxViewports is still 1, so this won't affect any change.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>