There is nothing inherent about these opcodes that requires them to only
take scalars. It's very convenient if we let them take vectors as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Found by inspection. This doesn't help much now but we'll see this
pattern with images if you load UNORM and then store UNORM.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166916 -> 15166910 (<.01%)
instructions in affected programs: 761 -> 755 (-0.79%)
helped: 6
HURT: 0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds the "(a << N) >> M" family of mask or sign-extensions. Not a
huge win right now but this pattern will soon be generated by NIR format
lowering code.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166918 -> 15166916 (<.01%)
instructions in affected programs: 36 -> 34 (-5.56%)
helped: 2
HURT: 0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If it's not the right bit-size, it may not actually be the correct
extraction. For now, we'll only worry about 32-bit versions.
Fixes: 905ff86198 "nir: Recognize open-coded extract_u16"
Fixes: 76289fbfa8 "nir: Recognize open-coded extract_u8"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Blending isn't valid for integer formats. Rather than having drivers
worry about this, just disable blending in this case. This hopefully
will increase hits in the CSO cache as well, by eliminating most of the
meaningless fields in this case.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This doesn't seem to make any difference in testing, but it fixes a
failed assertion when dumping sm3 shaders.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Setting GL_POINT_SPRITE_COORD_ORIGIN to GL_LOWER_LEFT did not work for
vgpu9. We can use the rasterizer sprite_coord_enable bitfield as-is.
We need to index into it using the TGSI semantic index, not the
register index.
This fixes the Piglit fbo-gl_pointcoord and glsl-fs-pointcoord tests.
Testing done: Piglit, Mesa sprite demos
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The flag_rect and flag_buffer fields didn't sufficiently capture
the state changes needed for those resource types. For example,
if a texture binding was changed from a 500x500 rect texture to a
400x400 rect texture we didn't set SVGA_NEW_TEXTURE_CONSTS. But
we need to do that to emit the new texcoord scale factors to the
constant buffers. Rather than track the sizes of all bound
resources, just set the flag if the resource is a rect. Same
story with texture buffers.
Also, since rect/buffer textures are usable with VS/GS shaders,
add SVGA_NEW_TEXTURE_CONSTS to the flags we check for emitting
VS/GS constants.
This seems to help with XFCE / xfwm4 desktop scaling.
VMware issue 2156696.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Add const qualifiers. Add 'f' suffix on floats to avoid double
promotion.
Remove unneeded shader type assertion since the switch statement
handled it already.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Move this now-unused function into the existing comment block, which was its only prior use.
../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning:
unused function 'partitionLoadStore' [-Wunused-function]
partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask)
Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
During intra stage linking some out variables can be dropped because
it is not used in a shader with the main function. But these out vars
can be referenced on later stages which can lead to further linking
errors.
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105731
Newer blit tests are enabling depth&stencils blits. We currently don't
support it but can do by iterating over the aspects masks (copy some
logic from the CopyImage function).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9f44745eca ("anv: Use blorp to implement VkBlitImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
OpenGL ES spec states:
"For normalized fixed-point rendering surfaces, the combination format
RGBA and type UNSIGNED_BYTE is accepted."
This fixes following failing VK-GL-CTS tests:
KHR-GLES3.packed_pixels.pbo_rectangle.rgba8_snorm
KHR-GLES3.packed_pixels.rectangle.rgba8_snorm
KHR-GLES3.packed_pixels.varied_rectangle.rgba8_snorm
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=107658
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andres Gomez <agomez@igalia.com>
This adds support for unrolling the classic
do {
// ...
} while (false)
that is used to wrap multi-line macros. GLSL IR also wraps switch
statements in a loop like this.
shader-db results IVB:
total loops in shared programs: 2515 -> 2512 (-0.12%)
loops in affected programs: 33 -> 30 (-9.09%)
helped: 3
HURT: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis. In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This avoids validation issues with some CTS tests with the following
patch, but its possible this we could also see the same problem with
the existing unrolling passes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.
This will be used in the following patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
No shader-db changes on any Intel platform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This greatly simplifies the next patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it. Keep the parameter,
but mark it as unused.
Silences 37 warnings like:
In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_pixel_interp_desc(const struct gen_device_info *devinfo,
^~~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Gen >= 9 have ability to control clamping of depth values separately at
near and far plane.
z_w is clamped to the range [min(n,f), 0] if clamping at near plane is
enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f)
max(n,f)] if clamping at both plane is enabled.
v2: 1) Use better coding style (Ian Romanick)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two
tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD.
v2: Remove unnecessary parentheses (Marek Olsak)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle
GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens.
Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it,
as suggested by Marek Olsak.
Driver that enables AMD_depth_clamp_separate will only ever look at
DepthClampNear and DepthClampFar, as suggested by Ian Romanick.
v2: 1) Remove unnecessary parentheses (Marek Olsak)
2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE
GL_DEPTH_CLAMP only (Marek Olsak)
3) Clamp against near and far plane separately (Marek Olsak)
4) Clip point separately for near and far Z clipping plane (Marek
Olsak)
v3: Clamp raster position zw to the range [min(n,f), 0] for near plane
and [0, max(n,f)] for far plane (Marek Olsak)
v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add some basic types and storage for the AMD_depth_clamp_separate
extension.
v2: 1) Drop unnecessary definition (Marek Olsak)
2) Expose extension in compatibility profile (Marek Olsak)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently we run the script but don't actually load any files, even in a
tarball where they exist.
Fixes: 3218056e0e
("meson: Build i965 and dri stack")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().
One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
When the SVBI Payload Enable is false I guess the register R1.4
which contains the Maximum Streamed Vertex Buffer Index is filled by zero
and GS stops to write transform feedback when the transform feedback
is not active.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is quite useful for debugging shader-transpiling issues in
virglrenderer.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
This adds an environment-varaible that can be used for driver-specific
flags, as well as a flag for it to enable verbose output.
While we're at it, quiet some overly chatty debug-output by default.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
This is the only direct call-site for fprintf in virgl; all other
call-sites call debug_printf instead. So let's follow in style here.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
This is just debug-cruft left over. Let's just get rid of it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.
v2: Print blank spaces in order to vertically align output of compacted
instructions hex value with uncompacted instructions hex value.
(Matt Turner)
v3: Fix line wrap at correct length
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gen7.5 has a BLEND_STATE of size 0 which includes a variable length
group. We did not deal with that very well, leading to an endless
loop.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.
total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs : 669901 -> 669373 (-0.08%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21064 (-0.02%)
local shared gpr inst bytes
helped 1 0 152 274 274
hurt 0 0 0 0 0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>