Commit 259fc50545 added linker error for
mismatching uniform precision, as required by GLES 3.0 specification and
conformance test-suite.
Several Android applications, including Forge of Empires, have shaders
which violate this rule, on a dead varying that will be eliminated.
The problem affects a big number of applications using Cocos2D engine
and other GLES implementations accept this, this poses a serious
application compatibility issue.
Starting from GLSL ES 3.0, declarations with conflicting precision
qualifiers are explicitly prohibited. However GLSL ES 1.00 does not
clearly specify the behavior, except that
"Uniforms are defined to behave as if they are using the same storage in
the vertex and fragment processors and may be implemented this way.
If uniforms are used in both the vertex and fragment shaders, developers
should be warned if the precisions are different. Conversion of
precision should never be implicit."
The word "used" is not clear in this context and might refer to
1) declared (same as GLES 3.x)
2) referred after post-processing, or
3) linked after all optimizations are done.
Looking at existing applications, 2) or 3) seems to be widely adopted.
To avoid compatibility issues, turn the error into a warning if GLSL ES
version is lower than 3.0 and the data is dead in at least one of the
shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97532
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can avoid adding the buffer in the non-local case, this will
avoid all the overhead of the indirect call.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The next patch will try and avoid calling the indirect function.
v2: add a missing conversion.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The function that calls us has just added the buffer to the
list already, no need to try and add it again.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There's no point recalculating these the whole time on descriptor
emission, just store them at pipeline creation.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Hilariously this is a fairly big win. Neil's multi-context-test
improves from ~24 to ~36 fps with llvmpipe on a Core i5-3317U. softpipe
also improves, from about 2.25 to 3.09 fps (when it's that slow, you're
allowed to be that precise).
I'd have added it to swrast classic, but the testcase wants GL 3.0 and
shaders, and that's not a thing classic has, so I figured making it work
on softpipe was crime enough.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
This advertises that the driver can accept a new context attribute
__DRI_CTX_ATTRIB_RELEASE_BEHAVIOR.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Previously the CreateContext method of __DriverApiRec took a set of
arguments to describe the attribute values from the window system API's
CreateContextAttribs function. As more attributes get added this could
quickly get unworkable and every new attribute needs a modification for
every driver.
To fix that, pass the attribute values in a struct instead. The struct
has a bitmask to specify which members are used. The first three members
(two for the GL version and one for the flags) are always set. If the
bit is not set in the attribute mask then it can be assumed the
attribute has the default value. Drivers will error if unknown bits in
the mask are set.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
It shouldn't be necessary to flush the context within the driver
implementation because the old context is explicitly flushed in
_mesa_make_current which is called a little further on. It is useful to
only have a single place that flushes when switching contexts to make it
easier to later implement the GL_KHR_context_flush_control extension.
The flush in intelMakeCurrent was added in commit 5505865 to implement
the GLX semantics that the context should be flushed when it is
released. When the commit was made there was no flush in
_mesa_make_current because it was only added later in 93102b4c. I think
that later commit effectively makes the first commit redundant.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
HALIGN_FOUR/SIXTEEN has no meaning for compressed textures, and we can't
render to them anyway. So use the tightest possible packing. This
avoids bugs with non-power-of-two block sizes.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Add ASTC texture support for hardware that supports this
(currently only GC3000 on i.MX6qp is known to have this).
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Updated as of etnav_viv commit 3b4a8ec.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
It appears the latest dota2 vulkan uses this,
and we get a hang in VR mode without it.
v2: remove finishme I left in after finishing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Uploaded data must start at (stride * start), because we can't modify
start in all cases. If it's the first allocation, it's also the amount
of memory wasted. If the starting offset is larger than the size of
the upload buffer, the buffer is re-created, used for 1 upload, and then
thrown away. If the upload is small, most of the buffer space is unused
and wasted. Keep doing that and the OOM killer comes. It's actually
pretty quick.
With signed VB offsets, we can set min_out_offset = 0
in u_upload_alloc/u_upload_data.
This fixes OOM situations with SPECviewperf.
Fixes:
make[2]: Leaving directory '/home/local/mesa/mesa-17.4.0-devel/_build/sub/src'
make[2]: *** No rule to make target '../../../src/git_sha1.h.in', needed by 'git_sha1.h'. Stop.
Makefile:660: recipe for target 'all-recursive' failed
Fixes: 16be271c6e "git_sha1_gen: use git_sha1.h.in on all build systems"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Instead of storing all the pointers and zeroing them all out,
just store a valid bitmask in the state. This also moves
the CmdBindPipeline path down the cpu usage path for the
multithreading demo as it no longer has to traverse MAX_SETS
to find the active descriptor sets.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This isn't required to be cleared, since buffers are only linked
by vertex elements, so if elements are clear then no buffers
should be referenced.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just removes a hole in the cmd_state and packs some bools
together.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.
Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.
Fixes a memory leak with multithreading demo.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
uint32_t data[MAX_SETS * 2] = {}; was getting executed before
the exit and took significant amounts of time. By having the
check outside the function, we skip the execution of the clear.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The vram_list linked list resulted in lots of pointer chasing.
Replacing this with an array instead improves descriptor set
allocation CPU usage by 3x at least (when also considering the free),
because it had to iterate through 300-400 sets on average.
Not a huge improvement as the pre-improvement CPU usage was only
about 2.3% in the busiest thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
In OpenCL/CUDA kernels, shared memory usage can be defined within the
kernel code. Those usage will only be picked up while parsing the
SPIR-V, during the translation phase of the program.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Remove the bits enabling Float blend optimization. It is
enabled through CACHE_MODE_SS register.
Update the comment.
Move gen10 if block on top of gen9 if block.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This optimization is enabled for previous generations too.
See Mesa commit c17e214a6b
On CNL this bit has been moved to CACHE_MODE_SS register.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Add the check for Post Sync Operation.
Update the workaround comment.
Use braces around if-else.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
There are few other (duplicate) workarounds which have similar recommendations:
WaFlushHangWhenNonPipelineStateAndMarkerStalled
WaCSStallBefore3DSamplePattern
WaPipeControlBefore3DStateSamplePattern
WaPipeControlBefore3DStateSamplePattern has some extra recommendations if
driver is using mid batch context restore. Ignoring it for now because We're
not doing mid-batch context restore in Mesa.
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Use brw_load_register_imm32() to program CACHE_MODE_0.
Get rid of brw_flush_gpu_caches().
V3: Make the workaround helper functions static.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by :Nanley Chery <nanley.g.chery@intel.com>
Fixes reverted patch f03b7c9 by doing VMID reservation per
process and not per context.
Also updates required amdgpu libdrm version since the change
involved interface updates in amdgpu libdrm.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This will be used in the next commit to build up register programming.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.
So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):
"For all of the interpolation functions, interpolant must be an l-value
from an in declaration; this can include a variable, a block or
structure member, an array element, or some combination of these.
Component selection operators (e.g., .xy) may be used when specifying
interpolant."
For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.
v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
(the ES WG decided that the relaxed rules may apply in a future version
but not retroactively; see also
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>