Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.
v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)
v3: Add units to parameter names of tile_extents (Nanley Chery)
Use _mesa_align_malloc for the shadow copy (Nanley)
Continue using gtt maps on gen4 (Nanley)
v4: Use streaming_load_memcpy when detiling
v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
takes precedence. Add intel_miptree_access_raw, needed after
rebasing on commit b499b85b0f.
v6: refactor to changes done for sse41 separation (Tapani)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
v3: add Android build support (Tapani)
v4: squash 'fix i915: Fix streaming loads for intel_tiled_memcpy'
separate sse41 to own static library (Tapani)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
There is currently no use of returned memcpy functions outside
intel_tiled_memcpy. Patch changes intel_get_memcpy to return memcpy
type instead of actual function. This makes it easier later to separate
streaming load copy in to own static library.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes errors thrown by GCC's Undefined Behaviour sanitizer (ubsan) every
time this macro is used.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes: a4c4efad89 "radv: Rework guard band calculation"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
`platforms` is no longer a comma-separated string, and some of our
option descriptions are way too long already. Just drop the incorrect
bit.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
They're not required to be the same as the access flag on the image
unit. For hardware that does shader image lowering based on the
qualifier (Intel), it may be required for state setup.
v2: (by Kenneth Graunke, incorporating feedback from Marek Olšák)
- Reduce both access and shader_access to uint16_t to avoid making
the pipe_image_view structure larger.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, we would fail if a variable had an assigned but unknown bit
size X and we tried to assign it an actual bit size. However, this is
ok because, at the time we do the search, the variable does have an
actual bit size and it will match X because of the NIR rules.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We rename some local variables in validate() to be more readable and
plumb the var through to get/set_var_bit_class instead of the var index.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
STATE_BASE_ADDRESS only modifies various bases if the "modify" bit is
set. Otherwise, we want to keep the existing base address.
Iris uses this for updating Surface State Base Address while leaving the
others as-is.
v2: Also update aubinator_viewer_decoder (caught by Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This not only makes them safe for more bit sizes but it also fixes a bug
in is_zero_to_one where it would return true for constant NaN.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On scalar ISAs, nir_lower_io_to_scalar_early enables significant
optimizations. However, on vector ISAs, it is counterproductive and
impedes optimal codegen. This patch only calls
nir_lower_io_to_scalar_early for scalar ISAs. It appears that at present
there are no upstreamed drivers using Gallium, NIR, and a vector ISA, so
for existing code, this should be a no-op. However, this patch is
necessary for the upcoming Panfrost (Midgard) and Lima (Utgard)
compilers, which are vector.
With this patch, Panfrost is able to consume NIR directly, rather than
TGSI with the TGSI->NIR conversion.
For how this affects Lima, see
https://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg189216.html
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
r600 doesn't have a hard requirement on LLVM, and therefore doesn't have
a hard requirement on libelf. Currently the logic doesn't allow that
however.
Distro-bug: https://bugs.gentoo.org/669058
Fixes: 5060c51b6f
("meson: build r600 driver")
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch exposes support for the following two extensions:
* VK_GOOGLE_decorate_string
* VK_GOOGLE_hlsl_functionality1
There's nothing for the driver to do; it's all handled in spirv_to_nir.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107971
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This extension adds two new decorations which carry meaning only for
HLSL shaders. They are expected to be handled by higher level layers
and can be ignored by implementations. However, it does save the client
a bit of work if the implementation safely ignores them instead of the
client having to strip them out of the SPIR-V in order for it to be
valid.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The comment was wrong, since the loop above casts to a type with the
correct bitsize already.
Fixes: 7e7ee82698 ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
And implement ac_bulid_expand_to_vec4() on top of it.
Fixes: 7e7ee82698 ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
`nir_intrinsic_image_deref_size` is not being considered during scan for
driver constants, so image constants are not emitted if a shader
only ever query the size of an image (no load, store, atomic op, etc).
This is unlikely, but possible.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Fixes: 2f52925f5c
"nv50/ir: move a * b -> a << log2(b) code into createMul()"
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
This warning detects non-void functions with a missing return statement,
return statements with a value in void functions, and functions with an
bogus return type that ends up defaulting to int. It's already enabled
by default with -Wall. Generally, these are fairly serious bugs in the
code, which developers would like to notice and fix immediately. This
patch promotes it from a warning to an error, to help developers catch
such mistakes early.
I would not expect this warning to change much based on the compiler
version, so hopefully it won't become a problem for packagers/builders.
See the GCC documentation or 'man gcc' for more details:
https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Warning-Options.html#index-Wreturn-type
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Instead of having weak references to the anv functions and separate
trampoline functions with their own dispatch table, just make the
trampoline functions weak. This gets rid of a dispatch table and
potentially lets the compiler delete the unused weak function. The
end result is a reduction in the .text section of 5.7K and a reduction
in the .data section of 1.4K.
Before:
text data bss dec hex filename
3190329 282232 8960 3481521 351fb1 _install/lib64/libvulkan_intel.so
After:
text data bss dec hex filename
3184548 280792 8960 3474300 35037c _install/lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 86b4bd52dc ("docs: update calendar, add news item and link
release notes for 18.2.3")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
It's broken, and WGL state tracker is always built with GLES support
noawadays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Trying to access the bus info before it is initialized is not going
to work.
Fixes: baa38c144f "vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108491
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
This lets us avoid passing the DRM fd around all over the place and gets
us closer to layer utopia.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In that case, we have to wait for the fence to synchronize with the
corresponding drawing we triggered in the X server.
Fixes incorrect display with the i965 driver and some applications, e.g.
solvespace.
Bugzilla: https://bugs.freedesktop.org/108097
Fixes: aefac10fec "loader/dri3: Only wait for back buffer fences in
dri3_get_buffer"
Tested-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back. This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This got missed during 1.1 enabling because it was defined as an
interaction between device groups and WSI and it wasn't obvious it was
in the delta.
The idea behind it is that it's supposed to provide a hint to the
application in a multi-GPU setup to indicate which regions of the screen
are being scanned out by which GPU so a multi-device split-screen
rendering application can render each part of the screen on the GPU that
will be presenting it and avoid extra bus traffic between GPUs. On a
single-GPU setup or one which doesn't support this present mode, we need
to do something. We choose to return the window size (or a max-size
rect) if the compositor, X server, or crtc is associated with the given
physical device and zero rectangles otherwise.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>