Commit Graph

2960 Commits

Author SHA1 Message Date
Samuel Pitoiset 39a35eb0c1 radeonsi: try to re-use previously deleted bindless descriptor slots
Currently, when the array is full it is resized but it can grow
over and over because we don't try to re-use descriptor slots.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 11:34:37 +02:00
Samuel Pitoiset c2dfa9b111 radeonsi: use slot indexes for bindless handles
Using VRAM address as bindless handles is not a good idea because
we have to use LLVMIntToPTr and the LLVM CSE pass can't optimize
because it has no information about the pointer.

Instead, use slots indexes like the existing descriptors. Note
that we use fixed 16-dword slots for both samplers and images.
This doesn't really matter because no real apps use image handles.

This improves performance with DOW3 by +7%.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 11:34:29 +02:00
Samuel Pitoiset 50349f404d radeonsi: add si_emit_global_shader_pointers() helper
To share common code between rw buffers and bindless descriptors.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 11:34:24 +02:00
Samuel Pitoiset a5ff4a8e2e radeonsi: only initialize dirty_mask when CE is used
Looks like it's useless to initialize that field when CE is
unused. This will also allow to declare more than 64 elements
for the array of bindless descriptors.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 11:34:23 +02:00
Samuel Pitoiset a29ef75565 radeonsi: make some si_descriptors fields 32-bit
The number of bindless descriptors is dynamic and we definitely
have to support more than 256 slots.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 11:34:21 +02:00
Samuel Pitoiset 781a13c475 radeonsi: declare new user SGPR indices for bindless samplers/images
A new pair of user SGPR is needed for loading the bindless
descriptors from shaders. Because the descriptors are global for
all stages, there is no need to add separate indices for GFX9.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 11:34:15 +02:00
Nicolai Hähnle 472c906d9f radeonsi/gfx9: add performance counters
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:55:16 +02:00
Nicolai Hähnle e271607668 radeonsi: extract common code of si_upload_{graphics,compute}_shader_descriptors
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:55:05 +02:00
Nicolai Hähnle a6e7693882 gallium: remove unused PIPE_DUMP_* defines
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:53:35 +02:00
Nicolai Hähnle f4c1d5a76d radeonsi: emit string markers to log context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:53:34 +02:00
Nicolai Hähnle 0c3f8aca7f radeonsi: log decompress blits
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:53:34 +02:00
Nicolai Hähnle 420c438589 radeonsi: log draw and compute state into log context
Also add missing trace emits and CS logging for compute launches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:53:34 +02:00
Nicolai Hähnle 4c3f36ec6b radeonsi: print saved CS to the log context
Use the auto logger facility, so that CS chunks will be interleaved
with other log info.

v2:
- fix some crashes when not using CE
- fix skipping "previous" chunks of current (unflushed) IB
- fix error handling in si_begin_cs_debug

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:53:14 +02:00
Nicolai Hähnle bc93339799 radeonsi: start using u_log_context for debugging
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:51:00 +02:00
Nicolai Hähnle ad33f2ddd8 radeonsi: re-order debug state dumping
Keep together the parts that won't use the deferred logging mechanism.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:50:57 +02:00
Nicolai Hähnle 40697e8678 radeonsi: make si_shader_selector_reference globally visible
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:50:55 +02:00
Nicolai Hähnle 4bbf6ded20 radeonsi: add reference count to si_compute
To allow keep-alive for deferred logging.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:50:53 +02:00
Nicolai Hähnle bbaad18c04 radeonsi: implement pipe_context::set_log_context
We'll add radeonsi-specific code to set_log_context in later patches,
but we may want to log from common code. Hence keep the log pointer
in r600_common_context.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-22 09:50:48 +02:00
Marek Olšák db039d67aa radeonsi: don't prefetch VBO descriptors if vertex elements == NULL
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-21 23:06:42 +02:00
Rob Herring 4734bfc02a Android: Fix LLVM duplicated symbols linking for N and M
Both statically linking libLLVMCore and dynamically linking libLLVM causes
duplicated symbols in gallium_dri.so and it fails to dlopen. We don't
really need to link libLLVMCore, but just need generated headers to be
built first. Dynamically linking to libLLVM instead is enough to do
that. Thanks to Qiang Yu for finding the root cause.

With this change, we can align all versions and just have libLLVM as a
shared lib dependency.

This also requires changes in the M and N versions of LLVM to export the
include paths for libLLVM. AOSP master is okay.

Fixes: 26aee6f4d5 ("Android: rework LLVM build support")
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Qiang Yu <Qiang.Yu@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-08-21 10:46:21 -05:00
Leo Liu 7319ff8787 radeon/uvd: add YUYV format support for target buffer
Make chroma plane optional for YUYV support

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-08-21 10:09:09 -04:00
Samuel Pitoiset 2843c5d15c radeonsi: update non-resident bindless descriptors if needed
Only resident bindless descriptors are currently updated and
re-uploaded, this makes sure that the non-resident ones are
also updated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-21 15:23:56 +02:00
Marek Olšák 57fb1bb585 gallium/radeon: remove old_fence parameter from r600_gfx_write_event_eop
just use the new scratch buffer.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-18 16:06:21 +02:00
Marek Olšák 41e053954d radeonsi/gfx9: prevent a GPU hang after a timestamp event
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-18 16:06:18 +02:00
Marek Olšák 13aa8d3da9 radeonsi: don't use CLEAR_STATE on SI
This fixes random hangs with Unigine Valley.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102201

Fixes: 064550238e ("radeonsi: use CLEAR_STATE to initialize some registers")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-18 15:59:22 +02:00
Marek Olšák 1ab7fed707 radeonsi: disable CE by default
It makes performance worse by a very small (hard to measure) amount.
We've done extensive profiling of this feature internally.

Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Christian König <christian.koenig@amd.com>
2017-08-15 15:03:43 +02:00
Marek Olšák d1285a7103 radeonsi/gfx9: fix the scissor bug workaround
otherwise there is corruption in most apps.

Fixes: 0fe0320 radeonsi: use optimal packet order when doing a pipeline sync

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-11 20:38:29 +02:00
Marek Olšák 27fef5d52d radeonsi/gfx9: use the VI codepath for clamping Z
This fixes corrupted shadows in Unigine Valley.
The corruption disappeared when I stopped setting IMG_DATA_FORMAT_24_8
for depth.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-11 20:38:29 +02:00
Marek Olšák 4630ede102 ac: fail shader compilation if libelf is replaced by an incompatible version
UE4Editor has this issue.

This commit prevents hangs (release build) or assertion failures (debug
build). It doesn't fix the editor, but catastrophic scenarios are
prevented.

Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-08-10 13:24:23 +02:00
Samuel Pitoiset bbfad34606 radeonsi: drop two unused variables in create_function()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-09 12:56:00 +02:00
Marek Olšák a2703fc119 radeonsi: fix a compile failure due to disabled asserts 2017-08-07 22:51:45 +02:00
Marek Olšák 0fe0320dc0 radeonsi: use optimal packet order when doing a pipeline sync
Process most new SET packets in parallel with previous draw calls, then
flush caches and wait, start the draw, and do L2 prefetches last.

This decreases the [CP busy / SPI busy] ratio (verified with GRBM perf
counters). In other words, the time window when shaders are idle (between
(the wait and the draw) is much shorter now.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák 895de1d03d radeonsi: expose the number of decompress calls to the HUD
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák c093821cee radeonsi: rename shader_userdata -> shader_pointers where appropriate
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák c441999b7a radeonsi: prefetch VBO descriptors after the first VGT shader
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák e887c68bd2 radeonsi: add a separate dirty mask for prefetches
so that we don't rely on si_pm4_state_enabled_and_changed, allowing us
to move prefetches after draw calls.

v2: ckear the dirty mask after unbinding shaders

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2017-08-07 21:12:24 +02:00
Marek Olšák a7b0014d1a radeonsi: add and use si_pm4_state_enabled_and_changed
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák 58d062b87d radeonsi: de-atomize L2 prefetch
I'd like to be able to move the prefetch call site around.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák 4e629ca7c7 radeonsi: align all CE dumps to L2 cache line size
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Marek Olšák 01fed67608 radeonsi: remove a tautology sctx->framebuffer.nr_samples >= 1
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-07 21:12:24 +02:00
Andres Rodriguez 7fe5fa0013 radeonsi: enable support for EXT_memory_object
v2: fix an indentation error
v3: don't enable for r600

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-08-06 12:42:07 +10:00
Andres Rodriguez 68623933a0 radeonsi: hook up device/driver UUID queries
v2: move from r600_common to radeonsi

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-06 12:42:07 +10:00
Mauro Rossi f99a733e38 android: radeonsi: add nir include paths
Android build changes to avoid the following building error:

target  C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_pipe.c
...
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.c:38:
external/mesa/src/compiler/nir/nir.h:48:10: fatal error: 'nir_opcodes.h' file not found
         ^
1 error generated.

Fixes: da62a31c5b "radeonsi: add nir include paths"
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-04 14:58:50 +01:00
Nicolai Hähnle 12ce39d3de radeonsi: set drirc compiler options before calling common screen init
Also, access the options directly, allowing us to get rid of the
PIPE_SCREEN_xxx flags.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-04 10:46:01 +02:00
Juan A. Suarez Romero 5ff4c5aef4 radeonsi: Makefile.sources: include driinfo_radeonsi.h
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-04 09:54:46 +02:00
Marek Olšák da942a4b81 radeonsi: program tile swizzle for color and FMASK surfaces for GFX & SDMA
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-04 02:10:04 +02:00
Marek Olšák ae5d86e94d radeonsi: if FMASK is disabled, set CB_COLORi_FMASK = CB_COLORi_BASE properly
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-04 02:10:04 +02:00
Mauro Rossi 5baed8f0e6 android: radeonsi: prepare for driver-specific driconf options
Android build changes to avoid the following building error:

In file included from external/mesa/src/gallium/targets/dri/target.c:1:
external/mesa/src/gallium/auxiliary/target-helpers/drm_helper.h:185:10:
fatal error: 'radeonsi/si_driinfo.h' file not found
         ^
1 error generated.

Fixes: 0f8c5de869 "radeonsi: prepare for driver-specific driconf options"
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-03 10:55:29 +01:00
Timothy Arceri 4e4042df6b gallium: introduce PIPE_CAP_MEMOBJ
This can be used to guard support for EXT_memory_object and related
extensions.

v2: update gallium docs

v3 (Timothy Arceri):
 - add cap to nv50

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-08-03 13:57:16 +10:00
Nicolai Hähnle 53485c2d0e radeonsi: add enable_sisched driconf option
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-02 09:50:59 +02:00
Nicolai Hähnle 0f8c5de869 radeonsi: prepare for driver-specific driconf options
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-02 09:50:58 +02:00
Nicolai Hähnle bc7f41e11d gallium: add pipe_screen_config to screen_create functions
This allows a more generic mechanism for passing user configurations
into drivers by accessing the dri options directly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-02 09:50:57 +02:00
Nicolai Hähnle 78476cfe07 radeonsi: enable ARB_transform_feedback_overflow_query
v2: update for new cap name

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-02 09:49:09 +02:00
Nicolai Hähnle a677799e51 gallium: add PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE and corresponding cap
v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit
    in the documentation

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-02 09:37:10 +02:00
Dave Airlie cb6f16dce9 radeon/ac: use ds_swizzle for derivs on si/cik.
This looks like it's supported since llvm 3.9 at least,
so switch over radeonsi and radv to using it, -pro also
uses this. We can now drop creating lds for these operations
as the ds_swizzle operation doesn't actually write to lds at all.

Acked-by: Marek Olšák <marek.olsak@amd.com>
(stable requested due to fixing radv CIK conformance tests)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-02 00:12:01 +01:00
Marek Olšák 1aeafb59e6 radeonsi: print CE IBs into ddebug reports
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-01 17:06:38 +02:00
Marek Olšák 1482861abe radeonsi: fix printing vertex buffer descriptors into ddebug reports
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-01 17:06:38 +02:00
Marek Olšák 404f524fe2 radeonsi: don't flush sL1 conditionally in WAIT_ON_CE_COUNTER
I don't know the condition for the flush, but we better turn this off.
The sL1 flush is used when CE dumps stuff into a ring buffer and the ring
buffer wraps.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-01 17:06:38 +02:00
Marek Olšák 94965b8219 radeonsi: set up HTILE in descriptors only when level 0 is accessible
Compression isn't enabled with non-zero levels.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-01 17:06:38 +02:00
Marek Olšák b9fc9d3f24 radeonsi: fix various CLEAR_STATE issues
Fixes: 064550238e ("radeonsi: use CLEAR_STATE to initialize some
                      registers")
Bugzilla: https://bugs.freedesktop.org/101969
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-01 17:06:38 +02:00
Nicolai Hähnle 2879a602dd radeonsi: ensure that temp array allocas are in the entry block
Otherwise, code generation fails. This has become necessary since some
shaders are wrapped in control flow.

Fixes: 081ac6e5c6 ("radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 15:00:22 +02:00
Nicolai Hähnle dfe237aef9 radeonsi: enable R600_DEBUG=nir for vertex and fragment shaders
Also, disable geometry and tessellation shaders. Mixing and matching NIR
and TGSI shaders should work (and I've tested it for the VS/PS interface),
but geometry and tessellation requires VS-as-ES/LS, which isn't implemented
yet for NIR.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:45 +02:00
Nicolai Hähnle 3b4f481c60 radeonsi: VS as ES/LS are not yet supported with R600_DEBUG=nir
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:44 +02:00
Nicolai Hähnle 3997b10f74 radeonsi/nir: lower uniforms to UBO loads
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:44 +02:00
Nicolai Hähnle d5741489d3 radeonsi/nir: lower txp instructions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:44 +02:00
Nicolai Hähnle 1c64637c26 ac/nir,radeonsi: add and use ac_shader_abi::frag_pos
v2: update for LLVMValueRefs in ac_shader_abi

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:43 +02:00
Nicolai Hähnle f03c54e05a ac/nir,radeonsi: add and use ac_shader_abi::{ancillary,sample_coverage}
v2: update for LLVMValueRefs in ac_shader_abi

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:43 +02:00
Nicolai Hähnle 25ff22e390 radeonsi: tweak next-shader assumptions when streamout is used
VS with streamout is always a HW VS.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:43 +02:00
Nicolai Hähnle a69afb68c9 radeonsi: use new function ac_build_umin for edgeflag clamping
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:42 +02:00
Nicolai Hähnle e247357240 ac/nir,radeonsi: add ac_shader_abi::front_face
v2: update for LLVMValueRefs in ac_shader_abi

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:42 +02:00
Nicolai Hähnle a0af3daf9c radeonsi: implement and use ac_shader_abi::load_ssbo
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:41 +02:00
Nicolai Hähnle d46018a4d7 radeonsi: make get_indirect_index globally visible
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:41 +02:00
Nicolai Hähnle 41d4016e06 radeonsi/nir: perform radeonsi-specific lowering and optimization passes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:40 +02:00
Nicolai Hähnle b49c2c9fa3 radeonsi/nir: perform lowering of input/output driver locations
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:40 +02:00
Nicolai Hähnle 8d23575c96 radeonsi/nir: add image descriptor loading
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:39 +02:00
Nicolai Hähnle f37f9aed84 ac/nir: add image and write parameter to ac_shader_abi::load_sampler_desc
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:38 +02:00
Nicolai Hähnle 677bd47cb9 radeonsi/nir: set si_shader_context::num_{sampler,images}
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:38 +02:00
Nicolai Hähnle 7c27ef182c radeonsi/nir: implement ac_shader_abi::load_sampler_desc
v2: remove enum desc_type from radeonsi (Marek)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:38 +02:00
Nicolai Hähnle 7763c7b2ba ac/nir,radeonsi: add ac_shader_abi::chip_class
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:37 +02:00
Nicolai Hähnle a6f597536d radeonsi/nir: emit FS outputs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:37 +02:00
Nicolai Hähnle c41a8e2ad9 radeonsi/nir: load FS inputs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:36 +02:00
Nicolai Hähnle 8643d41622 radeonsi/nir: load VS inputs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:36 +02:00
Nicolai Hähnle d007919d99 ac/nir,radeonsi: add ac_shader_abi::load_ubo
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:36 +02:00
Nicolai Hähnle 0c3b6a4bd9 ac,radeonsi: add ac_shader_abi::emit_outputs for hardware VS shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:34 +02:00
Nicolai Hähnle 1ea972e08a radeonsi: pass si_shader_context to get_primitive_id
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:33 +02:00
Nicolai Hähnle 9df23db13d radeonsi: translate NIR to LLVM
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:33 +02:00
Nicolai Hähnle d77526ee30 radeonsi: dump NIR instead of TGSI when appropriate
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:33 +02:00
Nicolai Hähnle c5f70a5174 radeonsi: bypass the shader cache for NIR shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:33 +02:00
Nicolai Hähnle 29d7bdd179 radeonsi: scan NIR shaders to obtain required info
v2: set num_instruction to 2, i.e. 1 + END (Marek)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:32 +02:00
Nicolai Hähnle 90b3ba8970 radeonsi: add si_shader_selector::nir
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:32 +02:00
Nicolai Hähnle acd09389cb radeonsi: implement pipe_screen::get_compiler_options for NIR
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:31 +02:00
Nicolai Hähnle da62a31c5b radeonsi: add nir include paths
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:31 +02:00
Nicolai Hähnle 61ad2f13c3 ac,radeonsi: move some VS input descriptions to ac_shader_abi
v2: use LLVM values instead of function parameter indices

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:31 +02:00
Nicolai Hähnle c7e9ebb3ab radeonsi: store shader function arguments in a structure
Aligns the code a bit more with ac/nir, and simplifies the setup of
ac_shader_abi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:31 +02:00
Nicolai Hähnle 00476907fc gallium/targets: link against NIR when building radeonsi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:30 +02:00
Nicolai Hähnle 01f1598a40 gallium: add PIPE_CAP_NIR_SAMPLERS_AS_DEREF
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:29 +02:00
Marek Olšák 5d8359ff4d radeonsi: expose MRT-draw-calls to HUD
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-31 12:46:43 +02:00
Marek Olšák f4d095cc65 radeonsi: update dirty_level_mask only when flushing or unbinding framebuffer
This fixes corruption with bindless textures in Dawn Of War 3.

The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask
was only updated after the first draw call. The problem is bindless textures
are checked for decompression every draw call and we would only decompress
after the first draw call. The solution is to set dirtiness after the last
draw call to the framebuffer, so the (unconditional) decompression of
bindless textures happens at the right time.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-28 16:34:24 +02:00
Marek Olšák 28c7fbbe0f radeonsi: rely on CLEAR_STATE for clearing UCP and blend color registers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-28 08:03:24 +02:00
Marek Olšák 7c721b28f6 radeonsi: rely on CLEAR_STATE for resetting the framebuffer and sample mask
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-28 08:03:24 +02:00
Marek Olšák 064550238e radeonsi: use CLEAR_STATE to initialize some registers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-28 08:03:24 +02:00
Nicolai Hähnle 06e20c4b8c radeonsi: bail out instead of crashing if the main shader part failed to compile
Reviewed: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:45 +02:00
Nicolai Hähnle 4dd86631f4 radeonsi: update a comment for merged shaders
Reviewed: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:45 +02:00
Nicolai Hähnle 4738dd9546 radeonsi/gfx9: dump previous stage LLVM IR for merged shaders
Reviewed: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:45 +02:00
Nicolai Hähnle 760876a7b1 radeonsi: make sure TCS main output VGPRs don't alias inputs
Avoids an unnecessary move introduce by "radeonsi/gfx9: always wrap GS
and TCS in an if-block (v2)"

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:42 +02:00
Nicolai Hähnle 081ac6e5c6 radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)
With merged ESGS shaders, the GS part of a wave may be empty, and the
hardware gets confused if any GS messages are sent from that wave. Since
S_SENDMSG is executed even when EXEC = 0, we have to wrap even
non-monolithic GS shaders in an if-block, so that the entire shader and
hence the S_SENDMSG instructions are skipped in empty waves.

This change is not required for TCS/HS, but applying it there as well
simplifies the logic a bit.

Fixes GL45-CTS.geometry_shader.rendering.rendering.*

v2: ensure that the TCS epilog doesn't run for non-existing patches

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:32 +02:00
Nicolai Hähnle 873789002f radeonsi/gfx9: fix vertex idx in ES with multiple waves per threadgroup
Cc: mesa-stable@lists.freedesktop.org
Reviewed: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:32 +02:00
Marek Olšák ed2b3f5c81 radeonsi: decrease the number of compiler threads
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26 19:53:26 +02:00
Marek Olšák 433f6f7ac9 gallium/radeon: make S_FIXED function signed and move it to shared code
This fixes a bug uncovered by:
    2412c4c81e
    util: Make CLAMP turn NaN into MIN.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26 19:53:26 +02:00
Nicolai Hähnle 65fbaab0b7 radeonsi: fix detection of DRAW_INDIRECT_MULTI on SI
The firmware version numbers for SI were wrong. The new numbers are probably
too conservative (we don't have a definitive answer by the firmware team),
but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on
Tahiti (by Gustaw) and on Verde (by myself).

While this is technically adding a feature, it's a feature we thought we had
for a long time. The change is small enough and we're early enough in the 17.2
release cycle that it should still go in.

Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-26 11:48:32 +02:00
Marek Olšák ecec21add2 radeonsi: add back the USE_MININUM_PRIORITY flag to the low-prio compiler queue
Accidentally removed in 9f320e0a38.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-18 13:13:34 -04:00
Marek Olšák c62809171c radeonsi/gfx9: add VM fault dmesg parser support
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:57:34 -04:00
Marek Olšák 9f320e0a38 radeonsi: automatically resize shader compiler thread queues when they are full
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:57:29 -04:00
Marek Olšák 465bb47d6f radeonsi: expose ARB_timer_query unconditionally
clock_crystal_freq is always non-zero now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:57:17 -04:00
Marek Olšák 5fb80a1e84 radeonsi: prevent a crash with DBG_CHECK_VM and u_threaded_context
by setting PIPE_CONTEXT_DEBUG in the caller

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:56:51 -04:00
Marek Olšák ffa7ec9e22 radeonsi: simplify computation of tessellation offchip buffers
This is overly cautious, but better safe than sorry.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:55:07 -04:00
Marek Olšák facfab28fe radeonsi/gfx9: add workarounds to avoid VGPR indexing completely
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.

In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák 93391ac478 radeonsi: emit param exports after position exports
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák 9d9ffc8475 radeonsi: move building parameter exports into a separate function
Both loops now look simple.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák 4e30fb4ecc radeonsi: don't use info.num_inputs when it's unused
For clarity. It's only used by color interpolation.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák f8d6dd9b3d radeonsi: add si_build_fs_interp helper
This is much simpler.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák 4560f2b90a radeonsi: merge si_llvm_get_amdgpu_target into ac_get_llvm_target
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák ece0c0439f radeonsi: don't call gallivm_init_llvm_targets
It's for initializing the native (x86) target.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Nicolai Hähnle c22e3c5373 radeonsi/gfx9: fix crash building monolithic merged ES-GS shader
Forwarding from the ES prolog to the ES just barely exceeds the current
maximum array size when 16 vertex attributes are used. Give it a decent
bump to account for merged shaders having up to 32 user SGPRs.

Fixes a crash in GL45-CTS.multi_bind.draw_bind_vertex_buffers.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-13 13:01:15 +02:00
Marek Olšák aaee0d1bbf gallium: use "ull" number suffix to keep the QtCreator parser happy
It can't parse "llu".

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-10 22:44:48 +02:00
Samuel Pitoiset a584a12308 radeonsi: fix invalidating bindless buffer descriptors
The VA is stored at [4:5], not [0:1]. This invalidated all
texture buffer descriptors when they were made resident in
the current context.

This removes few partial flushes and cache invalidations which
are needed when updating a bindless descriptor on the fly with
a WRITE_DATA packet.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-07 09:09:39 +02:00
Marek Olšák ccfac28835 radeonsi: set COMPUTE_DISPATCH_INITIATOR.ORDER_MODE = 1
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-29 16:19:35 +02:00
Marek Olšák af52e61935 radeonsi: use the DISPATCH packets to force COMPUTE_START_X/Y/Z = 0
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-29 16:19:35 +02:00
Rob Herring a3d98ca62f Android: use symlinks for driver loading
Instead of having special driver loading logic for Android, create
symlinks to gallium_dri.so so we can use the standard loading logic.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-06-29 09:09:49 -05:00
Marek Olšák 4a10d6154e radeonsi: move instance divisors into a constant buffer
Shader key size: 107 -> 47

Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors
are loaded from a constant buffer.

The shader code doing the division is huge. Is it something we need to
worry about? Does any app use instance divisors >= 2?

VS prolog disassembly:
    s_load_dwordx4 s[12:15], s[0:1], 0x80  ; C00A0300 00000080
    s_nop 0                                ; BF800000
    s_waitcnt lgkmcnt(0)                   ; BF8C007F
    s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004
    s_waitcnt lgkmcnt(0)                   ; BF8C007F
    v_cvt_f32_u32_e32 v4, s14              ; 7E080C0E
    v_rcp_iflag_f32_e32 v4, v4             ; 7E084704
    v_mul_f32_e32 v4, 0x4f800000, v4       ; 0A0808FF 4F800000
    v_cvt_u32_f32_e32 v4, v4               ; 7E080F04
    v_mul_hi_u32 v5, v4, s14               ; D2860005 00001D04
    v_mul_lo_i32 v6, v4, s14               ; D2850006 00001D04
    v_cmp_eq_u32_e64 s[12:13], 0, v5       ; D0CA000C 00020A80
    v_sub_i32_e32 v5, vcc, 0, v6           ; 340A0C80
    v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06
    v_mul_hi_u32 v5, v5, v4                ; D2860005 00020905
    v_add_i32_e32 v6, vcc, v5, v4          ; 320C0905
    v_subrev_i32_e32 v4, vcc, v5, v4       ; 36080905
    v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04
    v_mul_hi_u32 v5, v4, v1                ; D2860005 00020304
    v_add_i32_e32 v4, vcc, s8, v0          ; 32080008
    v_mul_lo_i32 v6, v5, s14               ; D2850006 00001D05
    v_add_i32_e32 v7, vcc, 1, v5           ; 320E0A81
    v_cmp_ge_u32_e64 s[12:13], v1, v6      ; D0CE000C 00020D01
    v_sub_i32_e32 v6, vcc, v1, v6          ; 340C0D01
    v_cmp_le_u32_e32 vcc, s14, v6          ; 7D960C0E
    v_cndmask_b32_e64 v8, 0, -1, s[12:13]  ; D1000008 00318280
    v_cndmask_b32_e64 v6, 0, -1, vcc       ; D1000006 01A98280
    v_and_b32_e32 v6, v8, v6               ; 260C0D08
    v_cmp_eq_u32_e32 vcc, 0, v6            ; 7D940C80
    v_cndmask_b32_e32 v6, v7, v5, vcc      ; 000C0B07
    v_add_i32_e32 v5, vcc, -1, v5          ; 320A0AC1
    v_cmp_eq_u32_e32 vcc, 0, v8            ; 7D941080
    v_cndmask_b32_e32 v5, v6, v5, vcc      ; 000A0B06
    v_add_i32_e32 v5, vcc, s9, v5          ; 320A0A09

v2: set prefer_mono for fetched instance divisors

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 19:55:09 +02:00
Marek Olšák aef998fe4b radeonsi: check nr_cbufs in other places before flushing CB
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:46:12 +02:00
Marek Olšák f9a7e7fe14 radeonsi: use #pragma pack to pack si_shader_key
sizeof(struct si_shader_key):
  Before reverting the 2 commits: 120 bytes
  After reverting the 2 commits: 128 bytes
  With #pragma pack: 107 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:45:07 +02:00
Marek Olšák 77d2a98353 Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs"
This reverts commit 7b2240ac9c.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:45:07 +02:00
Marek Olšák dbe45e1180 Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy"
This reverts commit 6b6fed3a3c.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27 18:45:07 +02:00
Marek Olšák ccf963ed29 radeonsi: don't flush and wait for CB after depth-only rendering
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-26 23:35:19 +02:00
Nicolai Hähnle f17d78becc radeonsi: support indirect indexing in INTERP_* opcodes
The hardware doesn't support it, so we just interpolate all array elements
and then use indirect indexing on the resulting vector.

Clearly, this is not very efficient. There is an argument to be had for
adding if/else, or perhaps even pulling the data out of LDS directly.
Both don't really seem worth the effort, considering that it seems nobody
actually uses this feature.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-26 14:02:06 +02:00
Marek Olšák e25950808f radeonsi/gfx9: don't overallocate shader binaries
It's not needed. The hw doesn't fetch ahead over page boundaries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-24 23:04:37 +02:00
Marek Olšák f6e98e99e3 radeonsi: unreference vertex buffers when destroying the context
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-06-23 19:53:54 +02:00
Marek Olšák ee16796d54 radeonsi: implement the workaround for Rocket League - postponed TGSI kill
Do KILL at the end of shaders so as not to break WQM.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100070

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23 19:50:20 +02:00
Marek Olšák a98a04ec80 gallium/radeon: pass create_screen flags to r600_common_screen_init
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23 19:50:20 +02:00
Marek Olšák c2f82fc1d3 Revert "radeonsi: don't emit partial flushes at the end of IBs (v2)"
This reverts commit c9040dc9e7.

People have reported it causes corruption on VI, and I see GPU hangs
on GFX9.
2017-06-23 19:13:55 +02:00
Marek Olšák db37c0be13 radeonsi/gfx9: don't ever flush the TC metadata cache
The closed Vulkan driver doesn't do it either.

Also remove some old comments that aren't useful.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 13:15:27 +02:00
Marek Olšák 920f20f039 radeonsi/gfx9: use TC L2 for fast color clear with CP DMA
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 13:15:27 +02:00
Marek Olšák c9040dc9e7 radeonsi: don't emit partial flushes at the end of IBs (v2)
The kernel sort of does the same thing with fences.

v2: do emit partial flushes on SI

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 13:15:27 +02:00
Nicolai Hähnle da2e52b382 radeonsi: use the correct LLVMTargetMachineRef in si_build_shader_variant
si_build_shader_variant can actually be called directly from one of
normal-priority compiler threads. In that case, the thread_index is
only valid for the normal tm array.

v2:
- use the correct sel/shader->compiler_ctx_state

Fixes: 86cc809726 ("radeonsi: use a compiler queue with a low priority for optimized shaders")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-22 09:45:23 +02:00
Marek Olšák 79bd1d4f8b radeonsi/gfx9: keep reusing the same buffer/address for the gfx9 flush fence
instead of using a monotonic suballocator

v2: initialize the memory at context creation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák c66fc618cc radeonsi/gfx9: enable the constant engine
I think this kernel commit fixes it:
     drm/amdgpu:use FRAME_CNTL for new GFX ucode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák d7141d8bc0 radeonsi/gfx9: indirect buffers and all CP packets use TC L2
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák 2638250fec radeonsi: flush CB after MSAA only when transitioning from CB to textures
The main flush before texturing is done after the FMASK decompress pass.

CB after MSAA rendering is not flushed in set_framebuffer_state and also
not in memory_barrier if the current color buffer is MSAA. We fully rely
on the FMASK decompress pass for the flushing.

Some CB decompress and resolve passes need an explicit flush before and
after.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák 51c219739c radeonsi: unify CB_RESOLVE blitter invocation code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák 2263610827 radeonsi: flush DB caches only when transitioning from DB to texturing
Use the mechanism of si_decompress_textures, but instead of doing
the actual decompression, just flag the DB cache flush there.

This removes a lot of unnecessary DB cache flushes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Marek Olšák fdca690e91 radeonsi: add separate HUD counters for CB and DB cache flushes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22 01:51:02 +02:00
Samuel Pitoiset ea2492b62f radeonsi: set correct usage flag according to image access type
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 13:01:18 +02:00
Samuel Pitoiset afeaa2e98a radeonsi: update all resident texture descriptors when needed
To avoid useless DCC fetches when DCC is disabled, descriptors
have to be updated in order to reflect this change. This is
quite similar to how we update descriptors of bound textures.

As a side effect, this should also prevent VM faults when
bindless textures are invalidated, because the VA in the
descriptor has to be updated accordingly as well.

I don't see any performance improvements with DOW3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 10:14:55 +02:00
Samuel Pitoiset f00e80e3f7 radeonsi: keep track of the sampler state for texture handles
Needed for updating all resident texture descriptors when
dirty_tex_counter changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-20 10:14:52 +02:00
Marek Olšák 3fc99f1299 radeonsi: fix dumping shader descriptors into ddebug logs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:16:20 +02:00
Marek Olšák f9dc29a9a5 radeonsi: add a workaround for inexact SNORM8 blitting again
GFX9 is affected.

We only have tests for GL_x_SNORM where x is R8, RG8, RGB8, and RGBA8.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák 0f827b51c0 radeonsi/gfx9: fix TC-compatible stencil compression
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák 8a264dd829 radeonsi/gfx9: fix TXF_LZ with 1D textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Marek Olšák 353b60cab5 radeonsi/gfx9: disable sparse buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19 20:15:36 +02:00
Samuel Pitoiset 6ff6863c32 radeonsi: reduce overhead for resident textures which need color decompression
This is done by introducing a separate list.

si_decompress_textures() is now 5x faster.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:38 +02:00
Samuel Pitoiset 06ed251c32 radeonsi: reduce overhead for resident textures which need depth decompression
This is done by introducing a separate list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:36 +02:00
Samuel Pitoiset 705a6a560e radeonsi: use util_dynarray_foreach for bindless resources
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:10:34 +02:00
Samuel Pitoiset 8d9e76ce1f gallium/radeon: add a new HUD query for the number of resident handles
Useful for debugging performance issues when ARB_bindless_texture
is enabled. This query doesn't make a distinction between texture
and image handles.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18 14:08:08 +02:00
Emil Velikov 1f958c1337 radeonsi: include ac_binary.h for struct ac_shader_binary
The header embeds the struct so it needs the header inclusion instead of
the dummy forward declaration.

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Tom Stellard <tstellar@redhat.com>
Fixes: 32206c5e56 ("radeonsi: Add radeon_shader_binary member to struct
si_shader")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-06-17 11:38:02 +01:00
Samuel Pitoiset 65d1e4d1eb radeonsi: enable ARB_bindless_texture
This has only been tested on RX480.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 285ec4463b radeonsi: add support for loading bindless images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 950b5ffa31 radeonsi: add support for loading bindless samplers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 0c2834c5b2 radeonsi: invalidate buffers which are made resident if needed
When a buffer becomes resident, check if it has been invalidated,
if so update the descriptor and the dirty flag.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 811756dfd0 radeonsi: upload new descriptors when resident buffers are invalidated
When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.

Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 48fe8a6210 radeonsi: only decompress resident textures/images when used
When the current bound shaders don't use any bindless textures
or images, it's useless to decompress the resident resources.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 2c3a7d5840 radeonsi: track use of bindless samplers/images from tgsi_shader_info
This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset e1813a8635 radeonsi: decompress resident textures/images before graphics/compute
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset d7e1a66bb5 radeonsi: decompress DCC for resident textures/images
Analogous to bound textures/images. We should also update the
resident descriptors and disable COMPRESSION_EN for avoiding
useless DCC fetches, but I postpone this optimization for a
separate series.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset a45e198e2d radeonsi: only add descriptors in presence of resident handles
This won't help much except for applications that use a ton
of resident handles. Though, this will reduce the winsys
overhead a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 333c8f65cf radeonsi: add all resident buffers to the current CS
Resident buffers have to be added to every new command stream.
Though, this could be slightly improved when current shaders
don't use any bindless textures/images but usually applications
tend to use bindless for almost every draw call, and the winsys
thread might help when buffers are added early.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 9cc328eef6 radeonsi: implement ARB_bindless_texture
This implements the Gallium interface. Decompression of resident
textures/images will follow in the next patches.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 77bbdcdfcd radeonsi: add a slab allocator for bindless descriptors
For each texture/image handles, we need to allocate a new
buffer for the bindless descriptor. But when the number of
buffers added to the current CS becomes high, the overhead
in the winsys (and in the kernel) is important.

To reduce this bottleneck, the idea is to suballocate the
bindless descriptors using a slab similar to the one used
in the winsys.

Currently, a buffer can hold 1024 bindless descriptors but
this limit is arbitrary and could be changed in the future
for some reasons. Once a slab is allocated the "base" buffer
is added to a per-context list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 86d7b7f01a radeonsi: add si_set_shader_image_desc() helper
To share some common code between bound and bindless images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 410b4ec06d radeonsi: add si_set_sampler_view_desc() helper
To share some common code between bound and bindless textures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 2ce04d7c1a radeonsi: add si_init_descriptor_list() helper
This will be used in order to initialize resident descriptors
for bindless textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset 973822bcee gallium: add PIPE_CAP_BINDLESS_TEXTURE
Whether bindless texture operations are supported by the
underlying driver.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Marek Olšák 4951b0adbd radeonsi: pack si_context better
there isn't much to gain here

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 6d43d352cc radeonsi: pack si_framebuffer better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák ca815f1ead radeonsi: pack si_sampler_view better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 29bf2530d8 radeonsi: pack si_buffer_resources better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák cf5ce61148 radeonsi: pack struct si_descriptors better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 217114dd73 radeonsi: pack struct si_vertex_elements better
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák e80a056ff9 radeonsi: replace si_vertex_elements::elements with separate fields
It makes si_vertex_elements a little smaller.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák c8b6f42e25 radeonsi: rename si_vertex_element -> si_vertex_elements
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 7be6186e0c radeonsi: allocate si_state_rasterizer::pm4_poly_offset only when needed
Each element has over 700 bytes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák a828f5d783 radeonsi: pack si_state_rasterizer fields
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 6b6fed3a3c radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy
The previous patch helps with this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 7b2240ac9c radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs
the next patch will benefit from this

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 1621b33d73 radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 30882ba0dd radeonsi: don't emit DB_STENCIL_CONTROL if it has no effect
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák 6743dc01fd radeonsi: fix missing num_L2_invalidates increment
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Marek Olšák c503381864 radeonsi: get rid of more compressed_colortex_mask names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-12 18:24:37 +02:00
Juan A. Suarez Romero a625d58ee1 radeonsi: call LLVMAddEarlyCSEMemSSAPass only for LLVM >= 4.0
LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0.

Fixes: 257b538 ("radeonsi: do EarlyCSEMemSSA LLVM pass)

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-06-08 23:32:32 +02:00
Marek Olšák 6940361796 gallium/radeon: don't allocate HTILE in a separate buffer
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák c6451b1209 radeonsi: rename depth decompress functions
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák d8a577d96e radeonsi: rename shader resource decompress masks to their true meaning
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák da26de5ff7 radeonsi: rename is_compressed_colortex -> color_needs_decompression
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák 391673af7a radeonsi: disable the patch ID workaround on SI when the patch ID isn't used (v2)
The workaround causes a massive performance decrease on 1-SE parts.
(Cape Verde, Hainan, Oland)

The performance regression is already part of 17.0 and 17.1.

v2: check tess_uses_prim_id

Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák 4b8d0c2b1d radeonsi: don't update dependent states if it has no effect (v2)
This and the previous clip_regs commit decrease IB sizes and the number of
si_update_shaders invocations as follows:

                 IB size   si_update_shaders calls
Borderlands 2      -10%            -27%
Deus Ex: MD         -5%            -11%
Talos Principle     -8%            -30%

v2: always dirty cb_render_state in set_framebuffer_state

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-08 23:29:07 +02:00
Marek Olšák bacaceb78a radeonsi: update clip_regs on shader state changes only when it's needed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:20 +02:00
Marek Olšák 2b7fd9df9a radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:18 +02:00
Marek Olšák 140b3c5019 radeonsi: add a new helper si_get_vs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:16 +02:00
Samuel Pitoiset 878bd981bf radeonsi: isolate real framebuffer changes from the decompression passes (v3)
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.

v2: Marek - set the flags outside the loop
          - also clear and set framebuffer.do_update_surf_dirtiness there
          - do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:14 +02:00
Marek Olšák 257b538fd2 radeonsi: do EarlyCSEMemSSA LLVM pass
so that LLVM IR looks like CSE has been run on it. It's also recommended
by the instruction combining pass.

This also fixes:
- GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash)
- piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail)

The code size decrease is positive, the register usage isn't. There is
a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown
and GRID Autosport.

EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE.

SGPRS: 1935420 -> 1938076 (0.14 %)
VGPRS: 1645504 -> 1645988 (0.03 %)
Spilled SGPRs: 2493 -> 2651 (6.34 %)
Spilled VGPRs: 107 -> 115 (7.48 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 1512 -> 1516 (0.26 %) dwords per thread
Code Size: 61981592 -> 61890012 (-0.15 %) bytes
Max Waves: 371847 -> 371798 (-0.01 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:09 +02:00
Marek Olšák e9409c86e7 radeonsi: remove 8 bytes from si_shader_key
We can use a union in si_shader_key::mono.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:17:06 +02:00
Marek Olšák 2b8b9a56ef radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC
Heaven LDS usage for LS+HS is below. The masks are "outputs_written"
for LS and HS. Note that 32K is the maximum size.

Before:
  heaven_x64: ls=1f1 tcs=1f1, lds=32K
  heaven_x64: ls=31 tcs=31, lds=24K
  heaven_x64: ls=71 tcs=71, lds=28K

After:
  heaven_x64: ls=3f tcs=3f, lds=24K
  heaven_x64: ls=7 tcs=7, lds=13K
  heaven_x64: ls=f tcs=f, lds=17K

All other apps have a similar decrease in LDS usage, because
the "outputs_written" masks are similar. Also, most apps don't write
POSITION in these shader stages, so there is room for improvement.
(tight per-component input/output packing might help even more)

It's unknown whether this improves performance.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 20:14:15 +02:00
Marek Olšák 7d67cbefe0 radeonsi: clean up decompress blend state names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:45 +02:00
Marek Olšák d2ee423b69 radeonsi: enable TC-compatible stencil compression on VI
Most things are in place. Ideally we won't see decompress blits for stencil
anymore.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 19:38:39 +02:00
Marek Olšák 3effce4fb0 radeonsi/gfx9: prevent a race when the previous shader's main part is missing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák b5bc826ead radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák ffbaba6072 radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9
LS is merged into TCS. If there is no TCS, LS is merged into fixed-func
TCS. The problem is the fixed-func TCS was ignored by scratch update
functions, so LS didn't have the scratch buffer set up.

Note that Mesa 17.1 doesn't have merged shaders.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák 6e2c07749b radeonsi: move streamout state update out of si_update_shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák 294be5279d radeonsi: remove dead code in declare_input_fs
Colors are interpolated in the PS prolog. This was never used.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák 8147c4a4a5 radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_key
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák 86cc809726 radeonsi: use a compiler queue with a low priority for optimized shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák 89b6c93ae3 util/u_queue: add an option to set the minimum thread priority
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák 6f2947fa79 radeonsi: decrease the number of compiler threads to num CPUs - 1
Reserve one core for other things (like draw calls).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák 38bd468a78 radeonsi: drop unfinished shader compilations when destroying shaders
If we enqueue too many jobs and destroy the GL context, it may take
several seconds before the jobs finish. Just drop them instead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:43:42 +02:00
Marek Olšák a893c91697 gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible
so that we can use TXF.

The cubemap blit pixel shader code size: 148 -> 92 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07 18:10:50 +02:00
Marek Olšák 6c655cfeb4 radeonsi: fix a GPU hang with tessellation on 2-CU configs
Only harvested Stoney has 2 CUs. Tested on 2-CU Stoney and Fiji forced
to 2 CUs.

Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-06-06 13:01:52 +02:00
Lyude 467af445a3 gallium: Add a cap to check if the driver supports ARB_post_depth_coverage
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-06-02 23:19:22 -04:00
Samuel Pitoiset 30a4e375f5 radeonsi: remove unused si_pm4_state::compute_pkt
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-31 09:20:57 +02:00
Samuel Pitoiset e4b05a50df radeonsi: remove chip_class define from si_pm4.h
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-31 09:20:55 +02:00
Samuel Pitoiset d90a6c2f23 radeonsi: merge si_pm4_free_state_simple() into si_pm4_free_state()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-31 09:20:53 +02:00
Marek Olšák 48b91103ce radeonsi: use ac_build_buffer_load for shader buffer loads
and document why we can't use SMEM yet.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Marek Olšák e019ea8f4b radeonsi: move building llvm.SI.load.const into ac_build_buffer_load
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Marek Olšák e1942c970f radeonsi: rename readonly_memory -> can_speculate
This is more accurate.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Marek Olšák 24306c0b27 radeonsi: fix a crash in si_destroy_context if we fail early
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-29 01:52:16 +02:00
Samuel Pitoiset ab8fb5a082 radeonsi: drop useless memcmp() check in si_set_blend_color()
cso_set_blend_color() already checks if the old state is different.
Only Nine uses pipe::set_blend_color() directly but I guess it
should use the cache too.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-05-27 18:00:45 +02:00
Leo Liu f94cfdc5f2 radeonsi: enable vcn decode
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Leo Liu c23ffafc50 radeon: rename has_uvd info to has_hw_decode
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-25 11:40:20 -04:00
Marek Olšák 2beb31bd7c radeonsi/gfx9: compile shaders with +xnack
so that LLVM doesn't allocate SGPRs where XNACK is.

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-22 19:23:39 +02:00
Marek Olšák 807e1d2577 radeonsi/gfx9: use CE RAM optimally
On GFX9 with only 4K CE RAM, define the range of slots that will be
allocated in CE RAM. All other slots will be uploaded directly. This will
switch dynamically according to which slots are used by current shaders.

GFX9 CE usage should now be similar to VI instead of being often disabled.

Tested on VI by taking the GFX9 CE allocation codepath and setting
num_ce_slots = 2 everywhere to get frequent switches between both modes.
CE is still disabled on GFX9.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák 1cde473ec0 radeonsi: remove CE offset alignment restriction
This was only needed by LOAD_CONST_RAM, which is now only used to load
whole CE.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák a7f098fb76 radeonsi: only upload (dump to L2) those descriptors that are used by shaders
This decreases the size of CE RAM dumps to L2, or the size of descriptor
uploads without CE.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák 53c2ef36da radeonsi: record which descriptor slots are used by shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák 38828094e9 radeonsi: update si_ce_needed_cs_space
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák edb59ef2dc radeonsi: do only 1 big CE dump at end of IBs and one reload in the preamble
A later commit will only upload descriptors used by shaders, so we won't do
full dumps anymore, so the only way to have a complete mirror of CE RAM
in memory is to do a separate dump after the last draw call.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák 06690e63f7 radeonsi: remove early return in si_upload_descriptors
All updates of descriptors_dirty also set dirty_mask, so the return is
unnecessary. The next commit will want this function to be executed
even if dirty_mask == 0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák b8f8d9e46c radeonsi: clamp indirect index to the number of declared shader resources
We'll do partial uploads of descriptor arrays, so we need to clamp
against what shaders declare.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák f07c15ef80 radeonsi: merge sampler and image descriptor lists into one
Sampler slots: slot[8], .. slot[39] (ascending)
Image slots: slot[7], .. slot[0] (descending)

Each image occupies 1/2 of each slot, so there are 16 images in total,
therefore the layout is: slot[15], .. slot[0]. (in 1/2 slot increments)

Updating image slot 2n+i (i <= 1) also dirties and re-uploads slot 2n+!i.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Marek Olšák 5df24c3fa6 radeonsi: merge constant and shader buffers descriptor lists into one
Constant buffers: slot[16], .. slot[31] (ascending)
Shader buffers: slot[15], .. slot[0] (descending)

The idea is that if we have 4 constant buffers and 2 shader buffers, we only
have to upload 6 slots. That optimization is left for a later commit.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 22:15:02 +02:00
Samuel Pitoiset 1468e29e02 radeonsi: get the sampler view type from inst->Texture for TG4
This will also magically fix this special lowering for
bindless samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18 21:48:16 +02:00
Marek Olšák 50189379fa gallium: add PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION
for skipping mapped-buffer checking in every GL draw call

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17 20:28:44 +02:00