This allows a more generic mechanism for passing user configurations
into drivers by accessing the dri options directly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If the queue is full, util_queue_add_job will wait while bo_fence_lock is
held.
It pb_slab wants to reuse a buffer, it will lock the pb_slab mutex and
try to check BO fence busyness, but it has to wait for bo_fence_lock to get
released. Both bo_fence_lock and pb_slab mutex are locked now.
When the CS thread unreferences and releases a suballocated buffer,
it will try to lock the pb_slab mutex and has to wait. The CS thread
can't finish its job in order to free a queue slot and unblock
util_queue_add_job ==> deadlock.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Rather than using 64k, use what addrlib returns as the base
alignment for vulkan allocations.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This decreases the number of BOs, but might also increase memory usage.
It's better for small textures.
The gameplay is on the far right:
https://people.freedesktop.org/~mareko/suballoc.svg
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
First this happens:
1) amdgpu_cs_flush (lock bo_fence_lock)
-> amdgpu_add_fence_dependency
-> os_wait_until_zero (wait for submission_in_progress) - WAITING
2) amdgpu_bo_create
-> pb_cache_reclaim_buffer (lock pb_cache::mutex)
-> pb_cache_is_buffer_compat
-> amdgpu_bo_wait (lock bo_fence_lock) - WAITING
So both bo_fence_lock and pb_cache::mutex are held. amdgpu_bo_create can't
continue. amdgpu_cs_flush is waiting for the CS ioctl to finish the job,
but the CS ioctl is trying to release a buffer:
3) amdgpu_cs_submit_ib (CS thread - job entrypoint)
-> amdgpu_cs_context_cleanup
-> pb_reference
-> pb_destroy
-> amdgpu_bo_destroy_or_cache
-> pb_cache_add_buffer (lock pb_cache::mutex) - DEADLOCK
The simple solution is not to wait for submission_in_progress, which we
need in order to create the list of dependencies for the CS ioctl. Instead
of building the list of dependencies as a direct input to the CS ioctl,
build the list of dependencies as a list of fences, and make the final list
of dependencies in the CS thread itself.
Therefore, amdgpu_cs_flush doesn't have to wait and can continue.
Then, amdgpu_bo_create can continue and return. And then amdgpu_cs_submit_ib
can continue.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101294
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: Add a func pointer to radeon_winsys to support radeon later.
Change-Id: I614ea71424f9e5c97e4ae68654315d28c89eaa5f
Signed-off-by: Samuel Li <Samuel.Li@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
BOs larger than the minimum fragment size should have their VA
alignet to at least the fragment size for optimal performance.
v2: drop unused leftover from initial implementation
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently, building with "mmma external/mesa3d" which builds all targets
and dependencies is broken for targets that require LLVM. This is due to
the build settings depending on MESA_ENABLE_LLVM. Instead of using a
conditional in the global Android.common.mk, make all the components that
need LLVM explicitly include the necessary build settings.
GALLIVM_CPP_SOURCES doesn't exist anymore, so remove that as well.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
src/gallium/targets/dri/Android.mk contains lots of conditional for
individual drivers. Let's move these details into the individual driver
makefiles.
In the process, align the make driver conditionals with automake
(i.e. HAVE_GALLIUM_*).
Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: add the radeon winsys for radeonsi]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Add exported include paths rather than explicitly adding the includes
in each user of the common AMD libs.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Only the Radeon kernel driver exposed the GPU temperature and
the shader/memory clocks, this implements the same functionality
for the AMDGPU kernel driver.
These queries will return 0 if the DRM version is less than 3.10,
I don't explicitely check the version here because the query
codepath is already a bit messy.
v2: - rebase on top of master
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We never add fences to backing buffers during submit. When we free a
backing buffer, it must inherit the sparse buffer's fences, so that it
doesn't get re-used prematurely via the cache.
v2:
- remove pipe_mutex_*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
... and implement the corresponding fence handling.
v2:
- add missing bit in amdgpu_bo_is_referenced_by_cs_with_usage
- remove pipe_mutex_*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the bulk of the buffer allocation logic. It is fairly simple and
stupid. We'll probably want to use e.g. interval trees at some point to
keep track of commitments, but Mesa doesn't have an implementation of those
yet.
v2:
- remove pipe_mutex_*
- fix total_backing_pages accounting
- simplify by using the new VA_OP_CLEAR/REPLACE kernel interface
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This probably has only minor performance effects, but it simplifies some
subsequent code slightly.
Ideally, it could also be used to simplify the handling of slab buffers
in the same way, but unfortunately that's not possible as long as we need
indices for relocations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is now exposed with libdrm_amdgpu 2.4.76.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Analogous to previous commit
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_destroy() was made unnecessary with fd33a6bcd7.
Replace was done with:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_destroy(\([^)]*\)):mtx_destroy(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_init() was made unnecessary with fd33a6bcd7.
Replace was done using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_init(\([^)]*\)):(void) mtx_init(\&\1, mtx_plain):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Allocating huge buffers in VRAM is not a problem, but when those buffers
start being migrated, the kernel runs into errors because it cannot split
those buffer up for moving through GTT.
This should fix intermittent failures of
GL45-CTS.texture_buffer.texture_buffer_max_size
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
cs can be NULL when it comes from r600_buffer_map_sync_with_rings()
to avoid doing the same checks. It was checked for write mappings
but not for read mappings.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This new query returns the current visible usage of VRAM accessed
by the CPU. It will return 0 on radeon because it's unimplemented.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
R600_DEBUG="info" can be used to display that size, as well as
the total amount of VRAM/GTT.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Useful when debugging applications which map a ton of buffers
and also because we used to run into Linux's limit on the number
of simultaneous mmap() calls.
v2: - update the commit message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The CS thread is needed to ensure proper ordering of operations and can't
be disabled (without complicating the code).
Discovered by Nine CSMT, which ended up in a deadlock.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: use gfxip names for llvm 4.0+
v3: use tonga for llvm <= 3.8, drop gfxip name,
we can just change that we change the other asics.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
FORCE_TILING should disable it. It has no effect now, but that may change
soon.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
GCN can use a completely different tile mode for FMASK.
FMASK allocation now skips one unrelated amdgpu_surface_init codepath as
hinted by the assertion.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I expect no change in behavior, because r600_texture.c forces the same
tile mode as the base texture has.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This removes input-only parameters from the radeon_surf structure.
Some of the translation logic from pipe_resource to radeon_surf is moved to
winsys/radeon.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Maybe this is why SDMA has been broken for many amdgpu users?
SDMA is the only block which is used with imported textures and relies
on this variable. DB also uses it, but it doesn't get imported textures,
so it's unaffected.
I do get SDMA failures on Tonga before this patch if R600_DEBUG=testdma
is changed to use imported textures.
Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that decompress blits aren't needed and depth texturing needs less
memory bandwidth.
Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible
HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16.
The format promotion is not visible to state trackers.
This is part of TC-compatible renderbuffer compression, which has 3 parts:
DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now.
I don't see a measurable increase in performance though.
(I tested Talos Principle and DiRT: Showdown, the latter is improved by
0.5%, which is almost noise, and it originally used layered Z16,
so at least we know that Z16 promoted to Z32F isn't slower now)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
which is returned for GL_MAX_TEXTURE_BUFFER_SIZE.
It doesn't have any other use at the moment.
Bigger allocations are not rejected.
This fixes GL45-CTS.texture_buffer.texture_buffer_max_size on Bonaire.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Small buffers are now handled via the slabs code, so separate buckets in
pb_cache have become redundant.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When a buffer is added to a CS without the SYNCHRONIZED usage flag, we now
no longer add a dependency on the buffer's fence(s).
However, we still need to add a fence to the buffer during flush, so that
cache reclaim works correctly (and in the hypothetical case that the buffer
is later added to a CS _with_ the SYNCHRONIZED flag).
It is now possible that the submissions refererring to a buffer are no longer
linearly ordered, and so we may have to keep multiple fences around. We keep
the fences in a FIFO. It should usually stay quite short (# of contexts * 2,
for gfx + dma rings).
While we're at it, extract amdgpu_add_fence_dependency for a single buffer,
which will make adding the distinction between real buffer and slab cases
easier.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When passed to winsys->buffer_create, this flag will indicate that we require
a buffer that maps 1:1 with a kernel buffer handle.
This is currently set for all textures, since textures can potentially be
exported to other processes. This is not a huge loss, since the main purpose
of this patch series is to deal with applications that allocate many small
buffers.
A hypothetical application with tons of tiny textures might still benefit
from not setting this flag, but that's not a use case I'm worried about
just now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Android porting of the following commits:
f1f1ba3 "radeonsi: move sid.h/r600d_common.h to a common place."
69fca64 "amd/addrlib: move addrlib from amdgpu winsys to common code"
This patch fixes android building errors
Reviewed-by: Dave Airlie <airlied@redhat.com>
The radeonsi driver doesn't and shouldn't care about the buffer index.
Only the virtual addresses matter.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The fence that is added to the BO during flush is guaranteed to be
signaled after all the fences that were in the fences array of the BO
before the flush, because those fences are added as dependencies for the
submission (and all this happens atomically under the bo_fence_lock).
Therefore, keeping only the last fence around is sufficient.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The idea is to have matching init/deinit functions so that deinit can be
re-used for cleanup in the error path of amdgpu_winsys_create.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Step one to merging radv would be to move some files around.
This only adds the include path to r600/radeonsi, because later
we want to avoid having to add it to the generic target paths.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The radeon kernel module doesn't have the firmware query interface, so the
corresponding values will remain 0.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Adds a second optional cleanup callback, called after the fence is
signaled. This is needed if, for example, the queue has the last
reference to the object that embeds the util_queue_fence. In this
case we cannot drop the ref in the main callback, since that would
result in the fence being destroyed before it is signaled.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If a depth/stencil texture has no mipmaps, we can always get a layout that is
compatible with DB and TC.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes a rare bug with stencil texturing -- seen on Polaris and Tonga,
though it's basically a function of the memory configuration so could affect
other parts as well.
Fixes piglit "unaligned-blit * stencil downsample" and various
"fbo-depth-array *stencil*" tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: sonjiang <sonny.jiang@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fix warnings like these due to HAVE_LIBDRM being inconsistently defined:
external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition]
typedef struct drm_clip_rect drm_clip_rect_t;
HAVE_LIBDRM needs to be set project wide to fix this. This change also
harmlessly links libdrm with everything, but simplifies the makefiles a
bit.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Also add dcc_fast_clear_size for clearing only the necessary subset
of DCC. For no AA, it's equal to the size of the whole DCC level.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
R9G9B9E5 is the only uncompressed one hopefully.
This fixes incorrect rendering not discovered (due to a lack of tests)
until DCC mipmapping was enabled.
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We will chain multiple chunks together and will keep pointers to the older
chunks to support IB dumping.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This avoids allocating giant IBs from the outset, especially for CE and DMA.
Since we now limit max_dw only by the size that the buffer happens to be
(which, due to the buffer cache, can be even larger than the rounded-up size
we request), the new function amdgpu_ib_max_submit_dwords controls when we
submit an IB.
With this change, we effectively never flush prematurely due to the CE IB,
after an initial warm-up phase.
v2:
- clean up buffer_size calculation
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ported from the initial amdgpu winsys from the private AMD branch.
The thread creates the buffer list, submits IBs, and cleans up
the submission context, which can also destroy buffers.
3-5% reduction in CPU overhead is expected for apps submitting a lot
of IBs per frame. This is most visible with DMA IBs.
v2: use a semaphore instead of a busy loop in amdgpu_ws_queue_cs
add another amdgpu_cs_sync_flush call into amdgpu_bo_map
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It's for the buffer cache.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This hasn't been needed, but I think we should set it.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The missing break caused the IB size to be overwritten with
the size of IB_CONST.
This was introduced in: 7201230582
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Necessary to prevent performance regressions due to extra flushing.
Probably should enlarge it even further when also updating
uniforms through the CE, but this seems large enough for now.
v2: Add preamble IB.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes Tonga with vramlimit=128 2x faster in Heaven.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
v2: Polaris chips should be defined after Stoney
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> (v1)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Signed-off-by: Leo Liu <leo.liu@amd.com> (v2 diff)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v2 diff)
v2: fix indentation as noted by Michel
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cons:
- it was only integrated in r600g
- it doesn't work with GPUVM
- it records buffer contents at the end of IBs instead of at the beginning,
so the replay isn't exact
- it lacks an IB parser and user-friendliness
A better solution is apitrace in combination with gallium/ddebug, which
has a complete IB parser and can pinpoint hanging CP packets.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Add layer support to export individual array layers.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add offset support to handle NV12 offsets as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We are going to need this for EGL_EXT_image_dma_buf_import.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be queried by the OpenCL stack using an interop call.
I have tested that the values match lspci.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This was needed for DRM < 2.12.0 where the kernel was rewriting tiling flags
in IBs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Enlarge the buffer hashlist to prevent large numbers of misses
due to adding more buffers than can be cached in the hashlist.
The game I tested had CS's with up to 1500 buffers and the overhead
of amdgpu_lookup_buffer for various sizes was:
4096 1.97% (new value)
2048 4.37%
1024 6.92%
512 9.47% (old value)
(percentage of CPU usage in render thread as determined by perf)
The time spent in amdgpu_add_buffer self is ~4.2% in all cases and
for 4096 the time needed to clear the hashlist is still < 0.10%,
so I am not expecting significant regressions.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
In particular, AMDGPU_GEM_CREATE_CPU_GTT_USWC can affect even BOs created
in VRAM if they get evicted to GTT. In general there's no need to
restrict any of the flags to any particular domains.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Failing to do this was resulting in the kernel driver unnecessarily
leaving open the possibility of CPU access to tiled BOs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93862
(This change shouldn't be backported to stable branches, because
released versions of xf86-video-amdgpu unnecessarily try to map the
front buffer)
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The whole point of AMD_pinned_memory is that applications don't have to map
buffers via OpenGL - but they're still allowed to, so make sure we don't break
the link between buffer object and user memory unless explicitly instructed
to.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
"radeon_winsys_cs_handle *cs_buf" is now equivalent to "pb_buffer *buf".
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is a prerequisite for the removal of radeon_winsys_cs_handle.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
As the alignment requirements can be 32 KiB or more, also adding
an aligned buffer creation function.
DCC is disabled for textures that can be shared as sharing the
DCC buffers has not been implemented yet.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
v2 (agd): rebase on mesa master, split pci ids to
separate commit
v3 (agd): use carrizo for llvm processor name for
llvm 3.7 and older
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
The files are not referenced in any other place in whole of
mesa. They are likely remnants of the early development stage.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If the 32-bit types overflowed, the driver could submit an IB that uses much
more memory than is available.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
v2: incorporate comments from Marek
v3: add missing fiji case in winsys init
use tonga raster config (double check this)
v4: rebase on harvest patch
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v3)
Reviewed-by: Christian König <christian.koenig@amd.com> (v3)
Reviewed-by: David Zhang <david1.zhang@amd.com> (v3)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: fix tonga chip check
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: David Zhang <david1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is an internal project that Catalyst uses and now open source will do
too.
v2: squashed these commits in:
- winsys/amdgpu: fix warnings in addrlib
- winsys/amdgpu: set PIPE_CONFIG and NUM_BANKS in tiling_flags
v2: - lots of changes according to Emil Velikov's comments
- implemented radeon_winsys::read_registers
v3: - a lot of new work, many of them adapt to libdrm interface changes
Squashed patches:
winsys/amdgpu: implement radeon_winsys context support
winsys/amdgpu: add reference counting for contexts
winsys/amdgpu: add userptr support
winsys/amdgpu: allocate IBs like normal buffers
winsys/amdgpu: add IBs to the buffer list, adapt to interface changes
winsys/amdgpu: don't use KMS handles as reloc hash keys
winsys/amdgpu: sync buffer accesses to different rings
winsys/amdgpu: use dependencies instead of waiting for last fence v2
gallium/radeon: unify buffer_wait and buffer_is_busy in the winsys interface (amdgpu part)
winsys/amdgpu: track fences per ring and be thread-safe
winsys/amdgpu: simplify waiting on a variable in amdgpu_fence_wait
gallium/radeon: allow the winsys to choose the IB size (amdgpu part)
winsys/amdgpu: switch to new amdgpu_cs_query_fence_status interface
winsys/amdgpu: handle fence and dependencies merge
winsys/amdgpu follow libdrm change to move user fence into UMD
winsys/amdgpu: use amdgpu_bo_va_op for va map/unmap v2
winsys/amdgpu: use the new tiling flags
winsys/amdgpu: switch to new GTT_USWC definition
winsys/amdgpu: expose amdgpu_cs_query_reset_state to drivers
winsys/amdgpu: fix valgrind warnings
winsys/amdgpu: don't use VRAM with APUs that don't have much of it
winsys/amdgpu: require LLVM 3.6.1 for VI because of bug fixes there
winsys/amdgpu: remove amdgpu_winsys::num_cpus
winsys/amdgpu: align BO size to page size
winsys/amdgpu: reduce BO cache timeout
winsys/amdgpu: remove useless flushing and waiting in amdgpu_bo_set_tiling
winsys/amdgpu: use amdgpu_device_handle as a unique device ID instead of fd
winsys/amdgpu: use safer access to amdgpu_fence_wait::signalled
winsys/amdgpu: allow maximum IB size of 4 MB
winsys/amdgpu: add ip_instance into amdgpu_fence
gallium/radeon: add RING_COMPUTE instead of RADEON_FLUSH_COMPUTE
winsys/amdgpu: set the ring type at CS initilization
winsys/amdgpu: query the GART page size from the kernel
winsys/amdgpu: correctly wait for shared buffers to become idle
winsys/amdgpu: set the amdgpu_cs_fence structure only once at fence creation
winsys/amdgpu: add a specific error message for cs_submit -> -ENOMEM
winsys/amdgpu: check num_active_ioctls before calling amdgpu_bo_wait_for_idle
winsys/amdgpu: clear user fence BO after allocating it
winsys/amdgpu: fix user fences
winsys/amdgpu: make amdgpu_winsys_create public
winsys/amdgpu: remove thread offloading
winsys/amdgpu: flatten the amdgpu_cs_context structure and simplify more
v4: require libdrm 2.4.63