Invalidated buffers don't have to go through it.
Split r600_init_resource into r600_init_resource_fields and
r600_alloc_resource.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL
interop and this is the only way to make it coherent with the current
context. It can optionally be set to NULL.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is just a workaround. The problem is described in the code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541
v2: say that it's only between the current context and aux_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Some ideas copied from Jakob Sinclair's implementation, but the color
clearing is completely different.
v2: remove leftover code, disable conditional rendering
disable render condition cleanly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
to reduce the call indirections with u_resource_vtbl.
The worst call tree you could get was:
- u_transfer_inline_write_vtbl
- u_default_transfer_inline_write
- u_transfer_map_vtbl
- driver_transfer_map
- u_transfer_unmap_vtbl
- driver_transfer_unmap
That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.
The new interface is:
pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
stride, layer_stride)
v2: fix whitespace, correct ilo's behavior
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
This fixes a rare bug with stencil texturing -- seen on Polaris and Tonga,
though it's basically a function of the memory configuration so could affect
other parts as well.
Fixes piglit "unaligned-blit * stencil downsample" and various
"fbo-depth-array *stencil*" tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a left-over of when I considered generalizing the separate stencil
support. I do prefer the new name since it emphasizes what flushing vs.
non-flushing means from a functional point-of-view, namely special handling
of the texture format.
v2: adjust r600_init_color_surface as well
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
DRI3:
- Only slows clears can enable it for the first frame.
- A good PS/draw ratio can enable it for other frames.
DRI2:
- Only slows clears can enable it for a frame.
- Page-flipped color buffers are unref'd at the end of each frame,
so it can't be enabled in any other way.
- Relying on slow clears is sufficient for our synthetic benchmarks.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
DCC for displayable surfaces is allocated in a separate buffer and is
enabled or disabled based on PS invocations from 2 frames ago (to let
queries go idle) and the number of slow clears from the current frame.
At least an equivalent of 5 fullscreen draws or slow clears must be done
to enable DCC. (PS invocations / (width * height) + num_slow_clears >= 5)
Pipeline statistic queries are always active if a color buffer that can
have separate DCC is bound, even if separate DCC is disabled. That means
the window color buffer is always monitored and DCC is enabled only when
the situation is right.
The tracking of per-texture queries in r600_common_context is quite ugly,
but I don't see a better way.
The first fast clear always enables DCC. DCC decompression can disable it.
A later fast clear can enable it again. Enable/disable typically happens
only once per frame.
The impact is expected to be negligible because games usually don't have
a high level of overdraw. DCC usually activates when too much blending
is happening (smoke rendering) or when testing glClear performance and
CMASK isn't supported (Stoney).
v2: rename stuff, add assertions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We could also do MSAA resolve in a compute shader like Vulkan and remove
these workarounds.
v2: comment the magic numbers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Allocating it has no effect, but it adds overhead (useless DCC clear).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Also add dcc_fast_clear_size for clearing only the necessary subset
of DCC. For no AA, it's equal to the size of the whole DCC level.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We want to keep DCC enabled to save bandwidth. It was a bad idea to disable
it here.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will get more complicated with mipmapped DCC or when DCC is enabled
after allocation.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
R9G9B9E5 is the only uncompressed one hopefully.
This fixes incorrect rendering not discovered (due to a lack of tests)
until DCC mipmapping was enabled.
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We don't import textures with DCC now, but soon we will.
v2: if we can't disable DCC for image writes, at least decompress DCC
at bind time
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This improves throughput by keeping TTM overhead down.
Some piglit tests such as texelFetch and streaming-texture-leak will
use less memory now.
v2: use gart_size / 4 as the threshold
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Next commits will add other things around this.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
to allow reallocating the texture storage with different parameters
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Just for consistency. This doesn't fix anything, because DCC is not
supported with non-mipmapped textures.
v1.1: fix the comment about DCC
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
this is more robust and probably fixes some bugs already
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
a staging cube texture with array_size % 6 != 0 doesn't work very well
just use 2D_ARRAY or 2D for all staging textures
Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
this is a leftover from the days when depth-stencil buffers were
allocated by the DDX
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This hasn't been needed, but I think we should set it.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Changes:
- don't flush DB for fast color clears
- don't flush any caches for initial clears
- remove the flag from si_copy_buffer, always assume shader coherency
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For some formats we need to take "do_endian_swap" into account when
configuring swapping for color buffers.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because r600 GPUs can't do swap in their DB unit, we need to disable
endianess swapping for textures that are handled by DB.
There are four format translation functions in r600g driver:
- r600_translate_texformat
- r600_colorformat_endian_swap
- r600_translate_colorformat
- r600_translate_colorswap
This patch adds a new parameters to those functions, called
"do_endian_swap". When running in a big-endian machine, the calling
functions will check whether the texture/color is handled by DB -
"rtex->is_depth && !rtex->is_flushing_texture" - and if so, they will
send FALSE through this parameter. Otherwise, they will send TRUE.
The translation functions, in specific cases, will look at this parameter
and configure the swapping accordingly.
v4:
evergreen_init_color_surface_rat() is only used by compute and don't
handle DB surfaces, so just sent hard-coded FALSE to translation
functions when called by it.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use PIPE_SWIZZLE_* everywhere.
Use X/Y/Z/W/0/1 instead of RED, GREEN, BLUE, ALPHA, ZERO, ONE.
The new enum is called pipe_swizzle.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Just for consistency. This is actually not a problem, because both addrlib
and radeon check and fix this.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is a remnant of the times when the DDX was allocating depth-stencil
buffers for windows. Now, st/dri allocates them and doesn't share them.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Instead of failing an assertion, disable DCC and CMASK on the first export
that needs it, and merge the external usage flags.
v2: clear the EXPLICIT_FLUSH flag if it's not set; whitespace fixes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Commit `d4e847ea` introduced a warning about making an
integer from a pointer without a cast, fix it here.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Add layer support to export individual array layers.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add offset support to handle NV12 offsets as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Clear DCC flags if necessary when binding a new sampler view.
v2: Do not reset DCC flags of bound sampler views.
v3: Check that we have a real texture (Nicolai)
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There is an annoying corner case that I stumbled across while looking into
piglit's arb_shader_image_load_store/execution/load-from-cleared-image.shader_test
(which can be easily adapted to demonstrate the bug without the
ARB_shader_image_load_store extension)
When we bind a texture and then clear it using glClear (by attaching it
to the current framebuffer) for the first time, we allocate a separate
cmask for the texture to do fast clear, but the corresponding bit in
compressed_colortex_mask is not set. Subsequent rendering will use
incorrect data.
Conversely, when a currently bound texture with an existing cmask is
exported leading to that cmask being disabled, the compressed_colortex_mask
bit will remain set, leading to an assertion later on in debug builds.
Since iterating through all contexts and/or remembering where every
texture is bound would be costly, and cmask enable/disable should be
rare, we will maintain a global counter to signal contexts that they
must update their compressed_colortex_masks.
This patch introduces the global counter, and subsequent patches will
do the mask update.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This should be okay except that sampler views and images are not re-set.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This was needed for DRM < 2.12.0 where the kernel was rewriting tiling flags
in IBs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow drivers to make better decisions about texture sharing
for DRI2, DRI3, Wayland, and OpenCL.
v2: add read/write flags, take advantage of __DRI_IMAGE_USE_BACKBUFFER
Reviewed-by: Axel Davy <axel.davy@ens.fr>
This function is currently broken for BE. I assume it's because of
util_pack_color(). Until I fix this path, I prefer to disable it so users
would be able to see correct colors on their desktop and applications.
Together with the two following patches:
- gallium/r600: Don't let h/w do endian swap for colorformat
- gallium/radeon: remove separate BE path in r600_translate_colorswap
it fixes BZ#72877 and BZ#92039
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After further testing, it appears there is no need for
separate BE path in r600_translate_colorswap()
The only fix remaining is the change of the last if statement, in the 4
channels case. Originally, it contained an invalid swizzle configuration
that never got hit, in LE or BE. So the fix is relevant for both systems.
This patch adds an additional 120 available visuals for LE and BE,
as seen in glxinfo
v2:
Tested for regressions by running piglit gpu.py with CAICOS (r600g) on
x86-64 machine. No regressions found.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because I changed the swizzle check, I also need to adapt the return
values for each check.
It's basically almost the same as before, we just cross between STD and
STD_REV, and cross between ALT and ALT_REV
This fixes the rgba test in gl-1.0-readpixsanity (piglit) and also
fixes tri-flat (mesa demos).
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The current code in r600_translate_colorswap uses the swizzle information
to determine which colorswap to use.
This works for BE & LE when the nr_channels is <4, but when nr_channels==4
(e.g. PIPE_FORMAT_A8R8G8B8_UNORM), this method can not be used for both BE
and LE, because the swizzle info is the same for both of them.
As a result, r600g doesn't support 24bit color formats, only 16bit, which
forces the user to choose 16bit color in X server.
This patch fixes this bug by separating the checks for LE and BE and
adapting the swizzle conditions in the BE part of the checks.
Tested on an Evergreen GPU (Cedar GL FirePro 2270) running inside POWER7
Big-Endian Machine.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "11.2" "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes compute since 7dd31b81fe
gallium/radeon: support PIPE_CAP_SURFACE_REINTERPRET_BLOCKS
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is already used internally in si_resource_copy_region for compressed
textures, so the only real change here is the adjusted surface size
computation.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>