The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <eric@anholt.net>
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs. Let's just write
the obvious code and then we know it's correct. This fixes a bunch of
missing cases particularly with int64.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <eric@anholt.net>
We've always required bit sizes to match but the rules for number of
components have been a bit loose. You've never been allowed to source
from something with less components than you consume, but more has
always been fine. This changes the validator to require that they match
exactly. The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source. We need to mask off the unused channels.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced. To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.
Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated. Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.
Shader-db results on Sky Lake:
total instructions in shared programs: 13318947 -> 13320265 (0.01%)
instructions in affected programs: 260633 -> 261951 (0.51%)
helped: 324
HURT: 1027
Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions. From a
spot-check of the shaders, the story is always the same: They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate. Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input. This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town. Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass. Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
otherwise generated entrypoint headers are not found during build
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
fixes build error when brw_nir.h not found in the generated file
brw_nir_trig_workarounds.c.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This patch adds missing error-checking and fixes resource leak in
allocation failure path on anv_CreateDevice()
v2: Fixes from Jason Ekstrand's review
a) Add missing destructors for all of the state pools on allocation
failure path
b) Add missing destructor for batch bo pools on allocation failure path
v3: Fixes from Emil Velikov's review
Add missing destructor for queue and scratch_pool on allocation failure
path
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ported from radeonsi, pointed out by Tom.
"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which
was being done when allocating a new buffer but not when reusing an
existing one in the cache. This would hit an assertion and crash in
debug builds of the Vulkan loader.
Fixes: 682248db45 ("radv: Cache command buffers in command pool.")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This should prevent cases when a buffer was incorrectly mapped without
synchronization just because this wasn't done.
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
No intended change in behavior. Just a refactor.
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is a wrapper for a Vulkan output array. A Vulkan output array is
one that follows the convention of the parameters to
vkGetPhysicalDeviceQueueFamilyProperties().
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes the following segmentation fault:
radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
-> if (!bo->handle)
(gdb) bt
0 radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
1 0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c
2 0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
wayland-drm-client-protocol.h is generated in builddir, so when
builddir != srcdir the header is not found, and compilation of
wsi_common_wayland.c will fail.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We have a performance problem with dynamic buffer descriptors. Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction. For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads. Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.
There are two potential solutions I have seen for this problem. The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway. The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.
This solution, however, is recommended for a few of reasons:
1. Surface states are relatively cheap. We've been using on-the-fly
surface state setup for some time in GL and it works well. Also,
dynamic offsets with on-the-fly surface state should still be
cheaper than allocating new descriptor sets every time you want to
change a buffer offset which is really the only requirement of the
dynamic offsets feature.
2. This requires substantially less compiler plumbing. Not only can we
delete the entire apply_dynamic_offsets pass but we can also avoid
having to add architecture for passing dynamic offsets to the back-
end compiler in such a way that it can continue using oword messages.
3. We get robust buffer access range-checking for free. Because the
offset and range are baked into the surface state, we no longer need
to pass ranges around and do bounds-checking in the shader.
4. Once we finally get UBO pushing implemented, it will be much easier
to handle pushing chunks of dynamic descriptors if the compiler
remains blissfully unaware of dynamic descriptors.
This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards. It appears
that the same is needed for fast-clears as well. This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4. The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense. The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things. Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
Rely on nir for optimization, to reduce compile times. Very minimal impact
on shader-db:
total instructions in shared programs: 104170 -> 104199 (0.03%)
total dwords in shared programs: 209664 -> 209728 (0.03%)
total full registers used in shared programs: 7156 -> 7161 (0.07%)
total half registers used in shader programs: 109 -> 109 (0.00%)
total const registers used in shared programs: 24222 -> 24224 (0.01%)
half full const instr dwords
helped 12 107 103 112 98
hurt 11 104 105 115 102
But shader db runtime dropped from ~29.3s user to ~20.4s user.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb.
With can now drop the checks on xxd in configure.
v2: Fix incorrect makefile dependency (Lionel)
v3: use $(PYTHON2) (Emil)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
compiler/brw_vec4_gs_visitor.cpp:744:39: error:
‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope
output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);
Fixes: d0d4a5f43b ("i965: split EU defines to brw_eu_defines.h")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The project is a thing only for BSD platforms. Or in other words - for
any other platforms building/installing pthread-stubs results only in a
pthread-stub.pc file.
And even where it provides a DSO, there's a fundamental design issue
with it - see the pthread-stubs mailing list for the specifics.
v2: Update comment above the switch statement (Jon Turney).
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Acked-by: Gary Wong <gtw@gnu.org>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Randy Fishel <randy.fishel@oracle.com>
Cc: Niveditha Rau <niveditha.rau@oracle.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As of last few commits we have the two split, thus we no longer require
the i965 in order to have the ANV driver.
Even though ANV does not link against libdrm nor libdrm_intel, we still
require those as dependencies due to the headers they provide.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 [Emil Velikov]
- Various fixes and initial stab at the Android build.
- Keep the generation rules/EXTRA_DIST outside the conditional
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At the moment all the tests but test_eu_compact are actual C++ gtests.
To simplify things, we can move the gtest.la to the common TEST_LIBS.
As we're here, we can rename change the test extension [to .cpp] to
avoid using the confusing dummy.cpp.
Add a nice comment in the makefile for posterity.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The test/binary was removed back in 2012. With that one gone, we can
drop the .gitignore file all together.
Cc: Eric Anholt <eric@anholt.net>
Fixes: c885039442 ("i965: Drop the missing symbols link test.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>