Commit Graph

287 Commits

Author SHA1 Message Date
Marek Olšák 24847dd1b5 gallium/u_queue: isolate util_queue_fence implementation
it's cleaner this way.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-22 20:26:39 +01:00
Marek Olšák 63c462226e radeonsi: fix issues with monolithic shaders
R600_DEBUG=mono has had no effect since:

    commit 1fabb29717
    Author: Marek Olšák <marek.olsak@amd.com>
    Date:   Tue Feb 14 22:08:32 2017 +0100

    radeonsi: have separate LS and ES main shader parts in the shader selector

Also, this assertion was failing:
    si_state_shaders.c:1307: si_shader_select_with_key: Assertion
    `!shader->is_optimized' failed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Marek Olšák 84e72f2962 radeonsi: skip TESSINNER/OUTER offchip stores if TES doesn't read them
We were unconditionally storing these outputs, sometimes even one component
at a time, but apps never read them in TES.

Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can
easily disable them on demand.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Nicolai Hähnle 066a117be7 radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIK
The same PS epilog workaround as for 8-bit integer formats is required,
since the CB doesn't do clamping.

Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-21 10:45:13 +01:00
Marek Olšák 45240ce598 radeonsi: use R600_RESOURCE_FLAG_UNMAPPABLE where it's desirable
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák 1fabb29717 radeonsi: have separate LS and ES main shader parts in the shader selector
This might reduce the on-demand compilation if the initial VS/LS/ES
determination is wrong.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák a02117ba6e radeonsi: don't compile pure monolithic shaders asynchronously
there is no point, we have to wait anyway.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák 41a2157a68 radeonsi: make fix_fetch an array of uint8_t
so that we can add 3-component fallbacks.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák 408f9a1584 radeonsi: atomize the scratch buffer state
The update frequency is very low.

Difference: Only account for the size when allocating a new one and when
            starting a new IB, and check for NULL. (v3)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Marek Olšák 5f99c49008 radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák c78177fc64 radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák 802fcdc0d2 radeonsi: atomize L2 prefetches
to move the big conditional statement out of draw_vbo

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák c99ba3eb47 radeonsi: unbind disabled shader stages to prevent useless L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák 35cd7551a4 radeonsi: use the correct target machine when building shader variants
If the shader selector is created with a different context than
the shader variant, we should use the calling context's target machine
for the shader variant.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99419

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák 3ae3be6dd4 radeonsi: move shader pipe context state into a separate structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Nicolai Hähnle 5e94e5bb9b radeonsi: fix R600_DEBUG=nooptvariant
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2017-01-16 20:16:18 +01:00
Marek Olšák 44e9b67229 radeonsi: make fix_fetch 64-bit
v2: add u_bit_consecutive64

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 18:07:08 +01:00
Marek Olšák 6f356d15be radeonsi: cleanly communicate whether si_shader_dump should check R600_DEBUG
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 12:01:30 +01:00
Marek Olšák 4b93ba542c radeonsi: assume that a TES without POSITION precedes GS
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 3753dc896d radeonsi: update clip_regs if clip_disable changes to fix a hang
This seems to fix the GPU hangs caused by:

commit ed3190b3f3
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Sun Nov 13 18:41:43 2016 +0100

    radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219

Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-05 14:01:18 +01:00
Nicolai Hähnle ec0a0a60cc radeonsi: shrink the GSVS ring to account for the reduced item sizes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:17 +01:00
Nicolai Hähnle 6fdef7d265 radeonsi: shrink each vertex stream to the actually required size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:13 +01:00
Nicolai Hähnle 2f2e941e2d radeonsi: use a single descriptor for the GSVS ring
We can hardcode all of the fields for swizzling in the geometry shader.

The advantage is that we use fewer descriptor slots and we no longer have to
update any of the (ring) descriptors when the geometry shader changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:05 +01:00
Nicolai Hähnle 7b5b3d63c5 radeonsi: update all GSVS ring descriptors for new buffer allocations
Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:06 +01:00
Marek Olšák e5302ad936 radeonsi: add a debug flag that disables optimized shader variants
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-23 18:49:10 +01:00
Marek Olšák 86514d84e0 util: import CRC32 implementation from gallium
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:05:51 +01:00
Marek Olšák bf75ef3f92 radeonsi: remove all varyings for depth-only rendering or rasterization off
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák ef6c84b301 radeonsi: eliminate VS outputs that aren't used by PS at runtime
A past commit added the ability to compile "optimized" shader variants
asynchronously (not stalling the app).

This commit builds upon that and adds what is basically a runtime shader
linker. If a VS output isn't used by the currently-bound PS, a new VS
compilation is started without that output. The new shader variant
is used when it's ready.

All apps using separate shader objects I've seen had unused VS outputs.

Eliminating unused/useless VS outputs also eliminates the corresponding
vertex attribute loads.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 7e76f9a7a8 radeonsi: record information about all written and read varyings
It's just tgsi_shader_info with DEFAULT_VAL varyings removed.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák c7f3e5c647 radeonsi: make si_shader_io_get_unique_index stricter
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák ed3190b3f3 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled
This is the first user of optimized monolithic shader variants.

Cull distances can't be disabled by states.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák d984a324bf radeonsi: add infrastr. for compiling optimized shader variants asynchronously
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák d2a56985d7 radeonsi: don't set vs.epilog.export_prim_id if TES is bound
there is no VS epilog in this case

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák fee71fec25 radeonsi: simplify checking for monolithic compilation
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 6d5c2a8b5c radeonsi: split the shader key into 3 logical parts
key->part.*: prolog and epilog flags only
key->as_{ls,es}: special flags
key->mono.*: flags for monolithic compilation only

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák e59389d738 radeonsi: assume that a VS without POSITION is LS
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Nicolai Hähnle 2c875158e2 radeonsi: fix vertex fetches for 2_10_10_10 formats
The hardware always treats the alpha channel as unsigned, so add a shader
workaround. This is rare enough that we'll just build a monolithic vertex
shader.

The SINT case cannot actually happen in OpenGL, but I've included it for
completeness since it's just a mix of the other cases.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-04 21:30:18 +01:00
Nicolai Hähnle 908f92ad1f radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and
others.

This leaves the case of triangle strips with adjacency and primitive restarts
open. It seems that the only thing that cares about that is a piglit test.
Fixing this efficiently would be really involved, and I don't want to use the
hammer of degrading to software handling of indices because there may well
be software that uses this draw mode (without caring about the precise
rotation of triangles).

v2:
- skip the GS prolog entirely if workaround is not needed
- only check for TES (TES is always non-null when tessellation is used)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:11:24 +01:00
Nicolai Hähnle 3b2516721b radeonsi: make the GS copy shader owned by the GS selector
The copy shader only depends on the selector. This change avoids creating
separate code paths for monolithic vs. non-monolithic geometry shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:50 +01:00
Nicolai Hähnle 9c6f7d66dc radeonsi: si_shader_vs only depends on the GS selector
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:48 +01:00
Nicolai Hähnle 693435d846 radeonsi: si_vgt_gs_mode only depends on the selector
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:45 +01:00
Marek Olšák d268b7f95e radeonsi: add a driver query for shader cache hits
This is an 8-month old patch.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák ad45dce4a2 radeonsi: remove si_resource_create_custom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák 29144d0f34 gallium/radeon: stop using PIPE_BIND_CUSTOM
it has no effect whatsoever

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák 3ec9975555 radeonsi: eliminate trivial constant VS outputs
These constant value VS PARAM exports:
- 0,0,0,0
- 0,0,0,1
- 1,1,1,0
- 1,1,1,1
can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports
can be removed from the IR to save export & parameter memory.

After LLVM optimizations, analyze the IR to see which exports are equal to
the ones listed above (or undef) and remove them if they are.

Targeted use cases:
- All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them
  are unused by PS (such as Witcher 2 below).
- VS output arrays with unused elements that the GLSL compiler can't
  eliminate (such as Batman below).

The shader-db deltas are quite interesting:
(not from upstream si-report.py, it won't be upstreamed)

PERCENTAGE DELTAS    Shaders PARAM exports (affected only)
batman_arkham_origins    589  -67.17 %
bioshock-infinite       1769   -0.47 %
dirt-showdown            548   -2.68 %
dota2                   1747   -3.36 %
f1-2015                  776   -4.94 %
left_4_dead_2           1762   -0.07 %
metro_2033_redux        2670   -0.43 %
portal                   474   -0.22 %
talos_principle          324   -3.63 %
warsow                   176   -2.20 %
witcher2                1040  -73.78 %
----------------------------------------
All affected             991  -65.37 %  ... 9681 -> 3353
----------------------------------------
Total                  26725  -10.82 %  ... 58490 -> 52162

v2: treat Undef as both 0 and 1

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
2016-10-19 22:21:46 +02:00
Marek Olšák a2ea653a49 radeonsi: remove cb0_is_integer handling
st/mesa does this for us.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák 7dddf0b7ab radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settings
The table was copied from the Vulkan driver. The comment lines are as long
as the table for cosmetic reasons.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák e12c1cab5d radeonsi: disable ReZ
This is a serious performance fix. Discovered by luck.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák e4bbab9022 radeonsi: fix R600_DEBUG=precompile for shader-db
radeonsi no longer supports pixel shaders without interpolation optimizations,
which led to assertion failures in si_shader_ps when running shader-db.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák 300a8221e9 radeonsi: add assertions to validate interpolation flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák 8c6ea5a6ff radeonsi: remove unnecessary #includes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:07 +02:00
Marek Olšák 53d2c8f00f radeonsi: don't re-create shader PM4 states after scratch buffer update
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:05 +02:00
Marek Olšák 275c073c6a radeonsi: export SampleMask from pixel shaders at full rate
Heaven and Valley write gl_SampleMask and not Z.
Use 16_ABGR instead of 32_ABGR if Z isn't written.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák 6c8b76263d radeonsi: also do VS_PARTIAL_FLUSH before updating VGT ring pointers
ported from Vulkan

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák c3f716fe67 gallium/radeon: merge USER_SHADER and INTERNAL_SHADER priority flags
there's no reason to separate these

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák e722b90bc9 radeonsi: eliminate PS OUT[1] if dual src blending is off and CB1 is not bound
All VP DX9 ports benefit from this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 12:24:35 +02:00
Marek Olšák c15a9dec29 radeonsi: skip unnecessary si_update_shaders calls
Small decrease in draw call overhead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák 1e5f00f9d5 radeonsi: pre-generate shader logs for ddebug
This cuts down the overhead of si_dump_shader when ddebug is capturing
shader logs, which is done for every draw call unconditionally (that's
quite a lot of work for a draw call).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Nicolai Hähnle 3d69357da9 radeonsi: ensure sample locations are set for line and polygon smoothing
Since commit d938b8c, the sample locations are no longer set unconditionally,
so we need to set the atom to dirty on all chips, not just Polaris.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-07-23 15:36:39 +02:00
Rob Clark 44bbfedbd9 gallium/u_queue: add optional cleanup callback
Adds a second optional cleanup callback, called after the fence is
signaled.  This is needed if, for example, the queue has the last
reference to the object that embeds the util_queue_fence.  In this
case we cannot drop the ref in the main callback, since that would
result in the fence being destroyed before it is signaled.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-16 10:00:04 -04:00
Nicolai Hähnle 04d93ea619 radeonsi: disable multi-threading when shader dumps are enabled
Otherwise, shader dumps can become interleaved and unusable.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:59:36 +02:00
Nicolai Hähnle 7ffc832ab8 radeonsi: use multi-threaded compilation in debug contexts
We only have to stay single-threaded when debug output must be synchronous.
This yields better parallelism in shader-db runs for me.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:59:32 +02:00
Nicolai Hähnle d938b8c0bf radeonsi: explicitly choose center locations for 1xAA on Polaris
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.

As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:52:50 +02:00
Marek Olšák 5c92c21369 radeonsi: do compilation from si_create_shader_selector asynchronously
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.

This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.

If an app creates more shaders, at most 4 threads will be used to compile
them.

Debug output disables this for shader stats to be printed in the correct
order.

(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 84824935cf radeonsi: don't lock shader cache mutex during compilation
to allow multiple shaders to be compiled simultaneously.

ALso, shader-db can again use all 4 cores.

v2: Remove the pipe_mutex_unlock call in the error path.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-07-05 00:47:13 +02:00
Marek Olšák 850cd953b1 radeonsi: separate the compilation chunk of si_create_shader_selector
The function interface is ready to be used by util_queue.
Also, si_shader_select_with_key can no longer accept si_context.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 027ad71b57 radeonsi: print LLVM IRs to ddebug logs
Getting LLVM IRs of hanging shaders have never been easier.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák 4d1f32376d radeonsi: don't interpolate colors if flatshading is enabled
use v_interp_mov for those

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák 4accb02d7a radeonsi: enable the barycentric optimization in all cases
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák 476e9cee1d radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled
This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák a675c6a000 radeonsi: split ps.prolog.force_persample_interp into persp and linear bits
This reduces the number of v_mov's in the prolog.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák eff81cbc81 radeonsi: enable distributed tess on multi-SE parts only
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák dd56d04568 radeonsi: set optimal VGT_HS_OFFCHIP_PARAM
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák d5383a7d31 gallium/radeon: use r600_resource_reference
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Nicolai Hähnle 1167905c41 radeonsi: use trapezoid distribution for tess on Fiji and Polaris
This yields a small performance improvement in Unigine Heaven.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:29:55 +02:00
Bas Nieuwenhuizen e9d3246a7a radeonsi: Don't offset OFFCHIP_BUFFERING on pre-VI cards.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96239
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-30 09:59:50 +02:00
Marek Olšák 43550f25ed radeonsi: always reserve output space for tess factors
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dave Airlie <airlied@redhat.com>
2016-05-27 21:40:43 +02:00
Bas Nieuwenhuizen 43d7305a40 radeonsi: Allow TES distribution between shader engines.
The R_028B50_VGT_TESS_DISTRIBUTION value is copied from
amdgpu-pro. Smaller values in the ACCUM fields seem to
decrease the performance advantage from this patch, higher
values don't seem to matter.

v2: Add distribution mode field enums.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen fee3160af9 radeonsi: Enable dynamic HS.
This allows running the TES on different CU's than the
TCS which results in performance improvements.

v2: Only write the control word from one invocation.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen 6217716e8f radeonsi: Store inputs to memory when not using a TCS.
We need to copy the VS outputs to memory. I decided to do this
using a shader key, as the value depends on other shaders.

I also switch the fixed function TCS over to monolithic, as
otherwisze many of the user SGPR's need to be passed to the
epilog, which increases register pressure, or complexity to
avoid that. The main body of the fixed function TCS is not
that interesting to precompile anyway, since we do it on
demand and it is very small.

v2: Use u_bit_scan64.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen 5c34562d7c radeonsi: Add offchip tessellation parameters.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen d27ff7d683 radeonsi: Add buffer for offchip storage between TCS and TES.
The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.

v2: - Use the correct register for SI.
    - Add define for block size.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Axel Davy fc3533c088 radeonsi: Change default behaviour for undefined COLOR0
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-18 23:37:14 +02:00
Nicolai Hähnle d8f3e8e626 radeonsi: always allocate export memory for pixel shaders
Experiments with framebuffer-no-attachments type draw calls have shown that
NULL exports stall terribly unless we ensure that export memory is allocated
by the SPI.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-09 11:52:46 -05:00
Nicolai Hähnle b9e6e8e7d4 radeonsi: fix undefined behavior (memcpy arguments must be non-NULL)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:46:59 -05:00
Marek Olšák c9e5a7df61 gallium: remove helpers converting to/from TGSI_PROCESSOR_*
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Marek Olšák af249a7da9 gallium: use PIPE_SHADER_* everywhere, remove TGSI_PROCESSOR_*
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Marek Olšák 8cfc4cf76d radeonsi: remove the shader parameter from si_set_ring_buffer
not used anymore

this is a follow-up to the RW buffer cleanup.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-22 01:14:14 +02:00
Marek Olšák 3138a28ff2 radeonsi: move default tess level constant buffer to RW buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Marek Olšák 1378487fb4 radeonsi: rename and rearrange RW buffer slots
- use an enum
- use a unique slot number regardless of the shader stage
  (the per-stage slots will go away for RW buffers)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Bas Nieuwenhuizen 38f4cee3ff radeonsi: Add config parameter to si_shader_apply_scratch_relocs.
shader->config is not updated for compute kernels.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-04-21 19:36:19 +02:00
Marek Olšák ffe44d0283 radeonsi: fold num_user_sgprs where it is possible
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák 51c4034f9b radeonsi: fix SGPRS calculation once more
This fixes GS piglit failures after adding SI_PARAM_SHADER_BUFFERS,
which bumped NUM_USER_SGPRS and uncovered this bug on SI.

If this was fixed in LLVM, these workarounds wouldn't be needed.

LLVM would have to look at the calling convention to know how many SGPR
inputs are declared, and add VCC and the scratch wave offset (which is
enabled even if we spill SGPRs but not VGPRs, oh well).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák 2ca5566ed7 radeonsi: move scissor and viewport states into gallium/radeon
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:13:24 +02:00
Nicolai Hähnle 6f942ac5ee radeonsi: disable early Z if the fragment shader writes to memory
Empirically, both the EXEC_ON_* flags and LATE_Z are necessary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Marek Olšák a73a657def radeonsi: process TGSI property NEXT_SHADER
This allows compiling the main shader part as ES or LS.

If we get the correct hint, non-separable GLSL shaders no longer have to be
compiled as VS first, followed by LS or ES compiled on demand.

The result is that fewer shaders are compiled by piglit, but it doesn't
improve piglit running time.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-19 23:20:01 +01:00
Nicolai Hähnle 4de25fa7b0 radeonsi: set DEPTH_BEFORE_SHADER based on FS_EARLY_DEPTH_STENCIL
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:59 -05:00
Marek Olšák d0f3b524cd radeonsi: use re-Z
This can increase perf for shaders that kill pixels (kill, alpha-test,
alpha-to-coverage).

v2: add comments

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-01 00:18:19 +01:00
Marek Olšák ff360a52e6 radeonsi: implement binary shaders & shader cache in memory (v2)
v2: handle _mesa_hash_table_insert failure
    other cosmetic changes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák 1fe73d55e3 radeonsi: move some struct si_shader members to new struct si_shader_info
This will be part of shader binaries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00