If view mask has only one bit set, view index is effectively a
constant, so doesn't need to be passed to the next stages, just always
set it.
Part of this was in the original patch that added
anv_nir_lower_multiview.c but disabled.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The view_index is encoded in the remainder of dividing instance id by
the number of views in the view mask (n). In the general case (handled
by the else clause), there is a need to map from 0..n-1 into the
number of the view being masked. For that a map is encoded.
In the case only the first n bits in the mask are set, the mapping is
trivial, 0..n-1 already represent what view is being referred to.
That case was in the original patch that added
anv_nir_lower_multiview.c but disabled.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Unfortunately TGSI doesn't record the type of the FS output like GLSL
does, but VC5's TLB writes depend on the output's base type. Just record
the type in the key at variant compile time when we've got a TGSI input
and then fix it up.
Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a
GPU hang that breaks most tests that come after it.
The implementation is inspired by
lower_instructions_visitor::dfrexp_sig_to_arith.
This has been tested against the arb_gpu_shader_fp64/fs-frexp-dvec4
test using the ARB_gl_spirv branch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
ring_name is "<class_name> + <instance_id>" (e.g. rcs0). So we need to
first compare the class name only, then get the instance id.
Without this, INSTDONE is not being decoded.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Shader-db runtime change avarage of five runs:
Before 125,77 seconds (+/- 0,09%)
After 124,48 seconds (+/- 0,07%)
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric at anholt.net>
Make a simple worklist by basically just wrapping u_vector.
This is intended used in nir_opt_dce to reduce the number of calls
to ralloc, as we are currenlty spamming ralloc quite bad. It should
also give better cache locality and much lower memory usage.
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric at anholt.net>
Different registers are used for execlist submission in gen11, so
also watch those. This code only watches element zero of the
submit queue, which is all aubdump currently writes.
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The coordinate shaders may now have side effects in the form of transform
feedback.
Part of fixing
GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_misc
This is the equivalent of commit 5770e1d89e for
android.
v2: fix xml files path and file given to --header
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: 2d2b15fbca ("i965: fix autotools/android build")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105634
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Title is about 17.3.5, when it must be about 17.3.6.
CC: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Generalize the code for remove dead loops to also remove dead if
nodes. The conditions are the same in both cases, if the node (and
it's children) don't have side-effects AND the nodes after it don't
use the values produced by the node.
The only difference is when evaluating side effects: loops consider
only return jumps as a side-effect -- they can stop execution of nodes
after it; 'if' nodes outside loops should consider all kinds of
jumps (return, break, continue) since all of them can cause execution
of nodes after it to be skipped.
After this patch, empty ifs (those which both then and else blocks are
empty) will be removed by nir_opt_dead_cf.
It caused no change to shader-db, in part because the removal of empty
ifs is currently covered by nir_opt_peephole_select.
v2: Improve the identification of cases where break/continue can cause
side-effects. (Jason)
v3: Move code comment changes to a different patch. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On the CI family, firmware requires the destory command have to be the
last command in the IB, moving feedback command after destroy is causing
issues on CI cards, so we have to keep the previous logic that moves
destroy back to the last command.
But as the original issue fixed previously, with the newer family like Vega10,
feedback command have to be included inside of the task info command along
with destroy command.
Fixes: 6d74cb25("radeon/vce: move destroy command before feedback command")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Added in Khronos in 2b6bb4ee45cc46c89d4a "EGL_MESA_drm_image: add
EGL_DRM_BUFFER_USE_CURSOR_MESA to egl.xml" [1] as part of PR #36 [2].
[1] 2b6bb4ee45
[2] https://github.com/KhronosGroup/EGL-Registry/pull/36
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This pulls in commit a93f559e9c11fa53fb5f1cc255b8f75433f85d2a "Add Ozone
section to eglplatform.h" from Khronos [1] added by Brian Anderson [2]
a few months ago.
[1] a93f559e9c
[2] https://github.com/KhronosGroup/EGL-Registry/pull/26
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use get_language_version to calculate default cl standard based on
device capabilities and -cl-std specified in build options.
v5; move dev_clc_version declaration from an earlier patch
v4: Squash the __OPENCL_VERSION__ and CLC language version patches
v3: (Jan) Allow device_version up to 2.2 while device_clc_version
only goes to 2.0
Use get_cl_version to calculate version instead
v2: Split out from the previous patch (Pierre)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
CC: Jan Vesely <jan.vesely@rutgers.edu>
Used to calculate the default CLC language version based on the --cl-std in build args
and the device capabilities.
According to section 5.8.4.5 of the 2.0 spec, the CL C version is chosen by:
1) If you have -cl-std=CL1.1+ use the version specified
2) If not, use the highest 1.x version that the device supports
Curiously, there is no valid value for -cl-std=CL1.0
Validates requested cl-std against device_clc_version
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
v7: (Pierre) Split cl/clc versions into separate lists and
make more references const.
v6: (Pierre) Add more const and fix some whitespace
v5: (Aaron) Use a collection of cl versions instead of switch cases
Consolidates the string, numeric version, and clc langstandard::kind
v4: (Pierre) Split get_language_version addition and use into separate patches
Squash patches that add the helpers and validate the language standard
v3: Change device_version to device_clc_version
v2: (Pierre) Move create_compiler_instance changes to correct patch
to prevent temporary build breakage.
Convert version_str into unsigned and use it to find language version
Add build_error for unknown language version string
Whitespace fixes
Mesa 18.0 series has not been released yet, so let's extend 17.3 lifetime.
v2: add 17.3.9 in the calendar (Andres Gomez)
CC: Andres Gomez <agomez@igalia.com>
CC: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The compiler doesn't notice that the condition for num_layers to be
undefined already defined it above (as our assert checked in a debug
build).
v2: Move the pair of assignments to one outside of the block.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This extension removes the restrictions on minDepth/maxDepth,
minDepthBounds/maxDepthBounds and VkClearDepthStencilValue::depth.
The following CTS tests now pass:
dEQP-VK.glsl.builtin_var.fragdepth.line_list_d32_sfloat_large_depth
dEQP-VK.glsl.builtin_var.fragdepth.point_list_d32_sfloat_large_depth
dEQP-VK.glsl.builtin_var.fragdepth.triangle_list_d32_sfloat_large_depth
dEQP-VK.draw.inverted_depth_ranges.nodepthclamp_depth_range_unrestricted
dEQP-VK.draw.inverted_depth_ranges.depthclamp_depth_range_unrestricted
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's a 32-bit integer like the layer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This was previously ignored.
Along with the virglrenderer patch, this fixes ~100 dEQP tests:
dEQP-GLES3.functional.texture.filtering.cube.*
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As of earlier commit, the --header was made a hard requirement when
using --code.
Hence - annotate both as required and drop a few no longer needed
checks.
Fixes: 035cc7a12d ("i965: perf: reduce i965 binary size")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Autotools/android builds generate the header & code files in 2 steps,
but the code generation requires the name of the header file to
include it.
This change generates both files in one command.
Fixes: 035cc7a12d ("i965: perf: reduce i965 binary size")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The have-new-DRI3 codepaths would never actually properly trigger, since
there was a typo in configure.ac which broke the version check. This
went unnoticed but for an error in config.log if you looked closely
enough.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Lukas F. Hartmann <lukas@mntmn.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Fixes: 7aeef2d4ef ("dri3: allow building against older xcb (v3)")
Cc: Dave Airlie <airlied@redhat.com>
VMware has no (published) support for Arm-architecture guests.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reported-by: Dylan Baker <dylan@pnwbakers.com>
On all Arm architectures (ARMv7 and below as 'arm', ARMv8 and above as
'aarch64'), only build swrast for DRI drivers. The only classic drivers
which could be used are r200 and NV20 cards, which seems unlikely enough
that it shouldn't be the default.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Javier Jardón <jjardon@gnome.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Stay consistent with the rest of the codebase, effectively fixing the
autotools build.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105621
Fixes: ffa4bbe466 ("st/nir/radeonsi: move nir_lower_uniforms_to_ubo()
to the state tracker")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Loop was accessing one more than bindingCount elements from
pBindings, accessing uninitialized memory.
Fixes: ddc4069122 ("anv: Implement VK_KHR_maintenance3")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Performance metric numbers are calculated the following way :
- out of the 256 bytes long OA reports, we accumulate the deltas
into an array of uint64_t
- the equations' generated code reads the accumulated uint64_t
deltas and normalizes them for a particular platform
Our hardware is such that a number of counters in the OA reports
always return the same values (i.e. they're not programmable), and
they return the same values even across generations, and as a result a
number of equations are identical in different metric sets across
different generations.
Up to now we've kept the generated code of the equations separated in
different files (per generation/GT), and didn't apply any
factorization of the common equations. We could have make some
improvement by reusing equations within a given metrics file, but we
can go even further and reuse across generations (i.e. all files).
This change changes the code generation to emit a single file in which
we reuse equations emitted code based on the hash of equations'
strings.
Here are the savings in a meson build :
Before(.old)/after :
$ du -h ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
43M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so
47M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
$ size build/src/mesa/drivers/dri/libmesa_dri_drivers.so build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
text data bss dec hex filename
13054002 409424 671856 14135282 d7aff2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so
14550386 409552 671856 15631794 ee85b2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
As a side comment here is the size of the drivers if we remove all of
the metrics from the build :
$ du -sh build/src/mesa/drivers/dri/libmesa_dri_drivers.so
40M build/src/mesa/drivers/dri/libmesa_dri_drivers.so
v2: Fix an issue with hashing of counter equations (Lionel)
Build system rework (Emil)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (build system part)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The equation code computes a float (percentage) yet the return type
was an uint64_t.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
==15115== 48 bytes in 1 blocks are definitely lost in loss record 16 of 66
==15115== at 0x4C2EC15: realloc (vg_replace_malloc.c:785)
==15115== by 0x8602C3E: _mesa_reserve_parameter_storage (prog_parameter.c:212)
==15115== by 0x8602D1E: _mesa_add_parameter (prog_parameter.c:252)
==15115== by 0x86032C4: _mesa_add_sized_state_reference (prog_parameter.c:384)
==15115== by 0x8603324: _mesa_add_state_reference (prog_parameter.c:409)
Fixes: edded12376 "mesa: rework ParameterList to allow packing"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The previous commit to make DRI3 modifier support optional, breaks with
an updated server and old client.
Make sure we never set multibuffers_available unless we also support it
locally. Make sure we don't call stubs of new-DRI3 functions (or empty
branches) which will never succeed.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 7aeef2d4ef ("dri3: allow building against older xcb (v3)")