Commit Graph

2960 Commits

Author SHA1 Message Date
Nicolai Hähnle 125a915052 radeonsi: implement PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE
v2: use uncached system memory for the fence, and use the CPU to
    clear it so we never read garbage when checking the fence

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:55 +01:00
Nicolai Hähnle e4627ac8fb radeonsi: document some subtle details of fence_finish & fence_server_sync
v2: remove the change to si_fence_server_sync, we'll handle that more
    robustly

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:50 +01:00
Nicolai Hähnle 609a230375 gallium/u_threaded: implement asynchronous flushes
This requires out-of-band creation of fences, and will be signaled to
the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag.

v2:
- remove an incorrect assertion
- handle fence_server_sync for unsubmitted fences by
  relying on the improved cs_add_fence_dependency
- only implement asynchronous flushes on amdgpu

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:42 +01:00
Nicolai Hähnle 78a4750d91 radeonsi: move fence functions to si_fence.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:42 +01:00
Nicolai Hähnle f76a6cb337 radeonsi: always use async compiles when creating shader/compute states
With Gallium threaded contexts, creating shader/compute states is
effectively a screen operation, so we should not use context state.

In particular, this allows us to avoid using the context's LLVM
TargetMachine.

This isn't an issue yet because u_threaded_context filters out non-async
debug callbacks, and we disable threaded contexts for debug contexts.
However, we may want to change that in the future.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:20 +01:00
Nicolai Hähnle b650fc09c3 radeonsi: fix potential use-after-free of debug callbacks
Found by inspection.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:20 +01:00
Nicolai Hähnle dd7c273e87 radeonsi: move pipe debug callback to si_context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:19 +01:00
Nicolai Hähnle 0f54ee6072 radeonsi: reduce the scope of sel->mutex in si_shader_select_with_key
We only need the lock to guard changes in the variant linked list. The
actual compilation can happen outside the lock, since we use the ready
fence as a guard.

v2: fix double-unlock

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:37:51 +01:00
Nicolai Hähnle 4f493c79ee radeonsi: use ready fences on all shaders, not just optimized ones
There's a race condition between si_shader_select_with_key and
si_bind_XX_shader:

  Thread 1                         Thread 2
  --------                         --------
  si_shader_select_with_key
    begin compiling the first
    variant
    (guarded by sel->mutex)
                                   si_bind_XX_shader
                                     select first_variant by default
                                     as state->current
                                   si_shader_select_with_key
                                     match state->current and early-out

Since thread 2 never takes sel->mutex, it may go on rendering without a
PM4 for that shader, for example.

The solution taken by this patch is to broaden the scope of
shader->optimized_ready to a fence shader->ready that applies to
all shaders. This does not hurt the fast path (if anything it makes
it faster, because we don't explicitly check is_optimized).

It will also allow reducing the scope of sel->mutex locks, but this is
deferred to a later commit for better bisectability.

Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:37:51 +01:00
Marek Olšák 33000e7c43 radeonsi: add si_screen::has_ls_vgpr_init_bug
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-07 17:58:40 +01:00
Marek Olšák cde664ab81 radeonsi: use ac_create_target_machine
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-07 17:58:38 +01:00
Marek Olšák 81f81fdb54 radeonsi: use ac_get_llvm_processor_name
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-07 17:58:36 +01:00
Marek Olšák c29f5fe41c radeonsi/gfx9: don't set gs_table_depth
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-07 17:58:33 +01:00
Marek Olšák e616743dab radeonsi/gfx9: limit the scissor bug workaround to Vega10 and Raven only
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-07 17:26:36 +01:00
Marek Olšák 3f58988b81 radeonsi: enable signed vertex buffer offsets 2017-11-06 19:09:12 +01:00
Marek Olšák 24d6318d24 gallium: add PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSET 2017-11-06 19:09:12 +01:00
Timothy Arceri 439a2febc4 ac/radeonsi: add support for tex instr without a derefence
These are produced by nir_lower_bitmap(), adding the missing derefence
would cause other issues that need to be hacked around such as
skipping sampler lowering and uniform location assignment, so this
change seems the correct way to go.

Fixes 194 piglit crashes on radeonsi using NIR.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-03 14:19:51 +11:00
Marek Olšák 529cdce799 radeonsi: remove 'Authors:' comments
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-02 18:19:03 +01:00
Dave Airlie d3fdd66401 gallium: add cap for driver specified max combined shader resources.
Some hw (evergreen) has a limit on how many combined (images/buffers/mrts)
a fragment shader can access.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-01 10:07:03 +10:00
Timothy Arceri e80bbd6f52 radeonsi: fix culldist_writemask in nir path
The shared si_create_shader_selector() code already offsets the mask.

Fixes the following piglit tests:

arb_cull_distance/clip-cull-3.shader_test
arb_cull_distance/clip-cull-4.shader_test

Fixes: 29d7bdd179 (radeonsi: scan NIR shaders to obtain required info)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-01 09:41:11 +11:00
Samuel Pitoiset dd79aa4ad3 radeonsi: update hack for HTILE corruption in ARK: Survival Evolved
It appears that flushing the DB metadata is actually not sufficient
since the driver uses the new VS blit shaders. This looks quite
strange though, but it seems like we need to flush DB for fixing
the corruption.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102955
Fixes: 69ccb9dae7 (radeonsi: use new VS blit shaders (VS inputs in SGPRs)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-27 10:47:30 +02:00
Marek Olšák 3f8e3c2bd8 radeonsi: add a workaround for weird s_buffer_load_dword behavior on SI
See my LLVM patch which fixes the root cause.

Users have to apply this patch and then they have 2 choices:
- Downgrade to LLVM 5.0
- Update to LLVM git after my LLVM patch is pushed.

It won't be possible to use current and earlier development version
of LLVM 6.0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
2017-10-26 16:44:01 +02:00
Dave Airlie 82d47b9d38 ac/llvm: consolidate find lsb function.
This was the same between si and ac.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-26 15:59:31 +10:00
Dave Airlie f925f5b074 ac/nir: move lds declaration/load/store into shared code.
This was duplicated between both drivers, share here.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-26 15:59:11 +10:00
Marek Olšák 2a414c3961 radeonsi: postponed KILL isn't postponed anymore, but maintains WQM
This restores performance for the drirc workaround, i.e.
KILL_IF does:
   visible = src0 >= 0;
   kill_flag &= visible; // accumulate kills
   amdgcn_kill(wqm_vote(visible)); // kill fully dead quads only

And all helper pixels are killed at the end of the shader:
   amdgcn_kill(kill_flag);

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-24 14:56:34 +02:00
Marek Olšák da0083f123 radeonsi: use postponed KILL only when derivatives are used
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-24 14:56:34 +02:00
Marek Olšák 1ff9e27cbd ac: replace ac_build_kill with ac_build_kill_if_false
This will be a new LLVM intrinsic and will also work nicely with
llvm.amdgcn.wqm.vote.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-24 14:56:34 +02:00
Bas Nieuwenhuizen a548b727a1 ac/nir: Only clamp shadow reference on radeonsi.
Vulkan CTS does not expect the value to be clamped (at least for D32),
and it makes a differences even though depth is in [0,1], due
to strict inequalities.

I couldn't find anything in the Vulkan spec about this, but the test
seemed to be copied from GL tests and the GL spec only specifies
clamping for fixed point formats. Hence I expect radeonsi to run into
this at some point as well, but given that they still have a usecase
with the Z16->Z32 promotion, I'll leave that for someone else to clean
up.

This at least fixes radv dEQP-VK.texture.shadow.* on VI.

Fixes: 0f9e32519b 'ac/nir: clamp shadow texture comparison value on VI'
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-10-23 09:13:38 +02:00
Andres Rodriguez 557de3b9ae radeonsi: hardcode shader WAVE_LIMIT to the maximum value
This is part of a cooperative scheduling approach used by radv. All
drivers in the stack must opt-in to resource arbitration, otherwise GL
based apps will be able to ignore system priorities.

We always hardcode the field to its maximum value, instead of attempting
to calculate an approximate usage. In testing, there were no benefits to
using anything other than the maximum.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-21 01:01:44 +02:00
Jason Ekstrand 59fb59ad54 nir: Get rid of nir_shader::stage
It's redundant with nir_shader::info::stage.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-10-20 12:49:17 -07:00
Marek Olšák 2f4705afde radeonsi: if there's just const buffer 0, set it in place of CONST/SSBO pointer
SI_SGPR_CONST_AND_SHADER_BUFFERS now contains the pointer to const buffer 0
if there is no other buffer there.

Benefits:
- there is no constbuf descriptor upload and shader load

It's assumed that all constant addresses are within bounds. Non-constant
addresses are clamped against the last declared CONST variable.
This only works if the state tracker ensures the bound constant buffer
matches what the shader needs.

Once we get 32-bit pointers, we can only do this for user constant buffers
where the driver is in charge of the upload so that it can guarantee a 32-bit
address.

The real performance benefit might not be measurable.

These apps get 100% theoretical benefit in all shaders (except where noted):
- antichamber
- barman arkham origins
- borderlands 2
- borderlands pre-sequel
- brutal legend
- civilization BE
- CS:GO
- deadcore
- dota 2 -- most shaders
- europa universalis
- grid autosport -- most shaders
- left 4 dead 2
- legend of grimrock
- life is strange
- payday 2
- portal
- rocket league
- serious sam 3 bfe
- talos principle
- team fortress 2
- thea
- unigine heaven
- unigine valley -- also sanctuary and tropics
- wasteland 2
- xcom: enemy unknown & enemy within
- tesseract
- unity (engine)

Changed stats only:
    SGPRS: 2059998 -> 2086238 (1.27 %)
    VGPRS: 1626888 -> 1626904 (0.00 %)
    Spilled SGPRs: 7902 -> 7865 (-0.47 %)
    Code Size: 60924520 -> 60982660 (0.10 %) bytes
    Max Waves: 374539 -> 374526 (-0.00 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák 854593b8eb ac: clean up ac_build_indexed_load function interfaces
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák cdb21dfffa radeonsi: handle 64-bit loads earlier in fetch_constant
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák ee0e1a47ce radeonsi: add si_descriptors::gpu_address and remove buffer_offset
This allows us to change the pointer arbitrarily.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák 6d2664880c radeonsi: unify code for extracting a buffer address from a descriptor
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák 8d2685d129 radeonsi: remove atom parameter from si_upload_descriptors
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák 4ddce1b1a4 radeonsi: pack si_descriptors better again
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák 859eeffb3d radeonsi: emit dirty consecutive pointers in one SET_SH_REG packet
IB size: -1.6%

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák 36626ffe46 radeonsi: split si_emit_shader_pointer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák 69325fa88d radeonsi: generalize the SI_VS_SHADER_POINTER_MASK macro
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák 79c2e7388c radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON
IB size: -0.4%

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák b762a08896 radeonsi/gfx9: move RW_BUFFERS from s[0:1] to s[8:9] for HS and GS
Let's use the same user data SGPRs in all stages.
(for SPI_SHADER_USER_DATA_COMMON_0)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Dylan Baker b154b44ae3 meson: build radeonsi gallium driver
This hooks up the bits necessary to build gallium dri drivers, with
radeonSI as the first example driver. This isn't tested yet.

v4: - drop radeonsi generated header from sources.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric at anholt.net>
2017-10-16 16:32:43 -07:00
Dylan Baker 66f97f6640 meson: build radeonsi
This builds the radeonsi (and radeon) window system bits and gallium
driver bits.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric at anholt.net>
2017-10-16 16:32:43 -07:00
Marek Olšák f536f45250 radeonsi: implement sync_file import/export
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-12 21:07:48 +02:00
Nicolai Hähnle bc2d874101 radeonsi: add support for PIPE_FORMAT_{X1,A1}R5G5B5_UNORM
Fixes dEQP-EGL.functional.image.modify.tex_rgb5_a1_tex_subimage_rgba8

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-10-12 08:42:55 +02:00
Dave Airlie 80bbdb1483 radeonsi: lower ffma in nir to mad.
This lowers ffma to a * b + c.

This seems like it should keep Marek happiest, so
we'd never get to the fma instruction emission code.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-11 07:33:32 +10:00
Eric Anholt ac0051a507 gallium: Create a new PIPE_CAP_TILE_RASTER_ORDER for vc4.
Because vc4 can control the order that tiles are rasterized in, we can use
it to implement overlapping blits using normal drawing and
GL_ARB_texture_barrier, as long as we can tell the kernel what order to
render the tiles in.

This commit introduces the core gallium support, vc4 changes will follow.

v2: Fix on the simulator.
v3: Add the cap (disabled) to other drivers, add rst docs for the cap.
v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS
v5: Drop vc4 changes from this commit, for clarity.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v3)
2017-10-10 10:45:22 -07:00
Marek Olšák 76997e9133 radeonsi: shrink r600d_common.h and stop using it
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-09 16:27:05 +02:00
Marek Olšák 0ecf9b90ef radeonsi: import cayman_msaa.c from drivers/radeon
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-09 16:27:04 +02:00