The fall-back does not work correctly in SIMD16 mode and the register
allocator should ensure that we never hit this case anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: fix whitespace and indentation
r332881 added an extra parameter to the emit function.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106619
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
Tested-By: Aaron Watry <awatry@gmail.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Commit 92f01fc5f9 ("i965: Emit VF cache invalidates for 48-bit
addressing bugs with softpin.") tried to only emit the VF invalidate if
the high bits changed, but it accidentally always set need_invalidate to
true; causing it to emit unconditionally emit the pipe control before
every primitive.
Fixes: 92f01fc5f9 ("i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106708
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The modifiers array hasn't been initialised by then, much less with data
that would need freeing.
Move the label after the loop to fix this.
Fixes: c80c08e226 ("vulkan/wsi/x11: Add support for DRI3 v1.2")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
---
v2: rebased on top of 432df741e0 "dri_util: Add
R10G10B10{A,X}2 translation between DRI and mesa_format."
0 is not a valid value for the __DRI_IMAGE_FORMAT_* enum.
It is, however, the value of MESA_FORMAT_NONE, which two of the callers
(i915 & i965) checked for.
The other callers (that check for errors, ie. st/dri) already check for
__DRI_IMAGE_FORMAT_NONE.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Just like we do in the autotools build.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cannot happen since, props to the autodetection further up.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Resources created with modifiers are treated as scanout because there is
no way for applications to specify the usage (though that capability may
be useful to have in the future). Currently all the resources created by
applications with modifiers are for scanout, so make sure they have bind
flags set accordingly.
This is necessary in order to properly export buffers for such resources
so that they can be shared with scanout hardware.
Tested-by: Daniel Kolesa <daniel@octaforge.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
Resources created for scanout but without modifiers need to be treated
as pitch-linear. This is because applications that don't use modifiers
to create resources must be assumed to not understand modifiers and in
turn won't be able to create a DRM framebuffer and passing along which
modifiers were picked by the implementation.
Tested-by: Daniel Kolesa <daniel@octaforge.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
This code path is no longer required with framebuffer modifier support.
Tested-by: Daniel Kolesa <daniel@octaforge.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
favicon.png is just gears.png resized to 64x64, and favicon.ico is
generated using this command, adapted from the ImageMagick example [1]:
$ convert favicon.png -background black \
\( -clone 0 -resize 16x16 \) \
\( -clone 0 -resize 32x32 \) \
\( -clone 0 -resize 48x48 \) \
\( -clone 0 -resize 64x64 \) \
-delete 0 -alpha off -colors 256 favicon.ico
We could edit every html page to add `<link rel="icon" href="favicon.ico" />`,
but there's not much point as pretty much every browser will pick it up
automatically if the file is named `favicon.ico` and is in the root folder.
[1] http://www.imagemagick.org/Usage/thumbnails/#favicon
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
v2 (Jose Maria Casanova Crespo <jmcasanova@igalia.com>): add float16 support
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
This reduces the number of SET_SH_REG packets which are emitted
for applications that use more than one descriptor set per stage.
We should be able to emit more SET_SH_REG packets consecutively
(like push constants and vertex buffers for the vertex stage),
but this will be improved later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will allow to emit consecutive shader pointers for
reducing the number of emitted SET_SH_REG packets, which
is recommended.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Previously, findFirstUse() only considered reads "uses". This fixes that
by making it check both an instruction's sources and definitions. It
also shortens both findFistUse() and findFirstDef() along the way.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Literally the same as the AMD ext.
Passes *indirect_draw_count* CTS tests.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This improves dota2 performance for me by 11% when I force the
GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my
threadripper), which should be a large part of the radv-amdvlk gap.
(For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68
fps)
It looks like dota2 rendered the GUI with a bunch of draws with
a SetScissors before almost each draw, causing a lot of pipeline
stalls.
I'm not really happy with the duplication of code, but overriding
radeon_set_context_reg would also be messy since we have the
pre-recorded pipelines and a bunch of si_cmd_buffer code, as well
as some memory->context reg loads for which things would be more
complicated.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This reverts commit 5c33e8c772. It broke
fixed function vertex programs on vc4 and v3d, and apparently caused
trouble for radeonsi's NIR paths as well.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
https://bugs.freedesktop.org/show_bug.cgi?id=106673
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.
v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Make sure only those components are written to that are specified in the
write mask.
Fixes:
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_vec3_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_vec3_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_vec3_fragment
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In cases like
IDIV TEMP[0].xy TEMP[0].xx TEMP[1].yy
the result will be written to the same register that is also a source register.
Since the components are evaluated one by one, this may result in overwriting
the source value for a later operation. Work around this by adding another
temporary to store the result if the destination temporary index is equal to
one of the source temporary indices.
Fixes:
dEQP-GLES2.functional.shaders.operator.binary_operator.div.*
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 79fe00efb4.
This reverts commit f5e8b13f78.
This reverts commit d21c086d81.
They broke the Android build and I'd rather not leave it broken
for the long holiday weekend.
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.
v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)
v3: Add units to parameter names of tile_extents (Nanley Chery)
Use _mesa_align_malloc for the shadow copy (Nanley)
Continue using gtt maps on gen4 (Nanley)
v4: Use streaming_load_memcpy when detiling
v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
takes precedence. Add intel_miptree_access_raw, needed after
rebasing on commit b499b85b0f.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We stream from a tiled and aligned source into an unaligned user buffer,
so we need to use _mm_storeu_si128.
Fixes: d21c086d81 (i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For certain EGLImage cases, we represent a single slice or LOD of an
image with a byte offset to a tile and X/Y intratile offsets to the
given slice. Most of i965 is fine with this but it breaks blorp. This
is a terrible way to represent slices of a surface in EGL and we should
stop some day but that's a very scary and thorny path. This gets blorp
to start working with those surfaces and fixes some dEQP EGL test bugs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106629
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GS is tested, tessellation is untested.
Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes shader images where we always bind stObj->pt and not individual
gl_texture_images.
Roughly based on i965 commit 845ad2667a
which does a similar thing but for a different reason.
This fixes GL CTS assertion failures introduced by Ilia.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Optimize AVX-512 PA Assemble (PA_STATE_OPT). Reduced generated code by
about 4x, MSVC compiler was going crazy making temporaries and
split-loading inputs onto the stack unless explicit AVX-512 load ops
were added
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Added two new files for a wrapper function for initialization
v2: added missing include for single architecture builds
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
It's recommended by the instruction combining pass, and
RadeonSI also runs it.
This pass used to segfault with one shader of F12017 in the
past, but it no longer crashes. Maybe the LLVM IR generated
by RADV has changed.
Polaris10:
Totals from affected shaders:
SGPRS: 441352 -> 441648 (0.07 %)
VGPRS: 310888 -> 300784 (-3.25 %)
Spilled SGPRs: 13576 -> 12983 (-4.37 %)
Code Size: 22560328 -> 22420544 (-0.62 %) bytes
Max Waves: 40755 -> 41366 (1.50 %)
Vega10:
Totals from affected shaders:
SGPRS: 442848 -> 442000 (-0.19 %)
VGPRS: 310396 -> 300460 (-3.20 %)
Spilled SGPRs: 13708 -> 12906 (-5.85 %)
Code Size: 22479428 -> 22336216 (-0.64 %) bytes
Max Waves: 45783 -> 46506 (1.58 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>