This will allow to hash additional data into the cache keys or even
change the hashing algorithm easily, should we decide to do so.
v2: don't try to compute key (and crash) if cache is disabled
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
threaded gallium can't use pipe_context's LLVM target machine, because
create_shader_selector can be called from a non-driver thread.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_destroy() was made unnecessary with fd33a6bcd7.
Replace was done with:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_destroy(\([^)]*\)):mtx_destroy(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_init() was made unnecessary with fd33a6bcd7.
Replace was done using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_init(\([^)]*\)):(void) mtx_init(\&\1, mtx_plain):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
V2:
- when loading from disk cache also binary insert into memory cache.
- check that the binary loaded from disk is the correct size. If not
delete the cache item and skip loading from cache.
V3:
- remove unrequired variable
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
R600_DEBUG=mono has had no effect since:
commit 1fabb29717
Author: Marek Olšák <marek.olsak@amd.com>
Date: Tue Feb 14 22:08:32 2017 +0100
radeonsi: have separate LS and ES main shader parts in the shader selector
Also, this assertion was failing:
si_state_shaders.c:1307: si_shader_select_with_key: Assertion
`!shader->is_optimized' failed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We were unconditionally storing these outputs, sometimes even one component
at a time, but apps never read them in TES.
Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can
easily disable them on demand.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The same PS epilog workaround as for 8-bit integer formats is required,
since the CB doesn't do clamping.
Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The update frequency is very low.
Difference: Only account for the size when allocating a new one and when
starting a new IB, and check for NULL. (v3)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If the shader selector is created with a different context than
the shader variant, we should use the calling context's target machine
for the shader variant.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99419
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This seems to fix the GPU hangs caused by:
commit ed3190b3f3
Author: Marek Olšák <marek.olsak@amd.com>
Date: Sun Nov 13 18:41:43 2016 +0100
radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We can hardcode all of the fields for swizzling in the geometry shader.
The advantage is that we use fewer descriptor slots and we no longer have to
update any of the (ring) descriptors when the geometry shader changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
A past commit added the ability to compile "optimized" shader variants
asynchronously (not stalling the app).
This commit builds upon that and adds what is basically a runtime shader
linker. If a VS output isn't used by the currently-bound PS, a new VS
compilation is started without that output. The new shader variant
is used when it's ready.
All apps using separate shader objects I've seen had unused VS outputs.
Eliminating unused/useless VS outputs also eliminates the corresponding
vertex attribute loads.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is the first user of optimized monolithic shader variants.
Cull distances can't be disabled by states.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
key->part.*: prolog and epilog flags only
key->as_{ls,es}: special flags
key->mono.*: flags for monolithic compilation only
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The hardware always treats the alpha channel as unsigned, so add a shader
workaround. This is rare enough that we'll just build a monolithic vertex
shader.
The SINT case cannot actually happen in OpenGL, but I've included it for
completeness since it's just a mix of the other cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and
others.
This leaves the case of triangle strips with adjacency and primitive restarts
open. It seems that the only thing that cares about that is a piglit test.
Fixing this efficiently would be really involved, and I don't want to use the
hammer of degrading to software handling of indices because there may well
be software that uses this draw mode (without caring about the precise
rotation of triangles).
v2:
- skip the GS prolog entirely if workaround is not needed
- only check for TES (TES is always non-null when tessellation is used)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The copy shader only depends on the selector. This change avoids creating
separate code paths for monolithic vs. non-monolithic geometry shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These constant value VS PARAM exports:
- 0,0,0,0
- 0,0,0,1
- 1,1,1,0
- 1,1,1,1
can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports
can be removed from the IR to save export & parameter memory.
After LLVM optimizations, analyze the IR to see which exports are equal to
the ones listed above (or undef) and remove them if they are.
Targeted use cases:
- All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them
are unused by PS (such as Witcher 2 below).
- VS output arrays with unused elements that the GLSL compiler can't
eliminate (such as Batman below).
The shader-db deltas are quite interesting:
(not from upstream si-report.py, it won't be upstreamed)
PERCENTAGE DELTAS Shaders PARAM exports (affected only)
batman_arkham_origins 589 -67.17 %
bioshock-infinite 1769 -0.47 %
dirt-showdown 548 -2.68 %
dota2 1747 -3.36 %
f1-2015 776 -4.94 %
left_4_dead_2 1762 -0.07 %
metro_2033_redux 2670 -0.43 %
portal 474 -0.22 %
talos_principle 324 -3.63 %
warsow 176 -2.20 %
witcher2 1040 -73.78 %
----------------------------------------
All affected 991 -65.37 % ... 9681 -> 3353
----------------------------------------
Total 26725 -10.82 % ... 58490 -> 52162
v2: treat Undef as both 0 and 1
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
The table was copied from the Vulkan driver. The comment lines are as long
as the table for cosmetic reasons.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is a serious performance fix. Discovered by luck.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
radeonsi no longer supports pixel shaders without interpolation optimizations,
which led to assertion failures in si_shader_ps when running shader-db.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Heaven and Valley write gl_SampleMask and not Z.
Use 16_ABGR instead of 32_ABGR if Z isn't written.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This cuts down the overhead of si_dump_shader when ddebug is capturing
shader logs, which is done for every draw call unconditionally (that's
quite a lot of work for a draw call).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Since commit d938b8c, the sample locations are no longer set unconditionally,
so we need to set the atom to dirty on all chips, not just Polaris.
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Adds a second optional cleanup callback, called after the fence is
signaled. This is needed if, for example, the queue has the last
reference to the object that embeds the util_queue_fence. In this
case we cannot drop the ref in the main callback, since that would
result in the fence being destroyed before it is signaled.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Otherwise, shader dumps can become interleaved and unusable.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We only have to stay single-threaded when debug output must be synchronous.
This yields better parallelism in shader-db runs for me.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.
As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.
This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.
If an app creates more shaders, at most 4 threads will be used to compile
them.
Debug output disables this for shader stats to be printed in the correct
order.
(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
to allow multiple shaders to be compiled simultaneously.
ALso, shader-db can again use all 4 cores.
v2: Remove the pipe_mutex_unlock call in the error path.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
The function interface is ready to be used by util_queue.
Also, si_shader_select_with_key can no longer accept si_context.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Getting LLVM IRs of hanging shaders have never been easier.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This yields a small performance improvement in Unigine Heaven.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The R_028B50_VGT_TESS_DISTRIBUTION value is copied from
amdgpu-pro. Smaller values in the ACCUM fields seem to
decrease the performance advantage from this patch, higher
values don't seem to matter.
v2: Add distribution mode field enums.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows running the TES on different CU's than the
TCS which results in performance improvements.
v2: Only write the control word from one invocation.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need to copy the VS outputs to memory. I decided to do this
using a shader key, as the value depends on other shaders.
I also switch the fixed function TCS over to monolithic, as
otherwisze many of the user SGPR's need to be passed to the
epilog, which increases register pressure, or complexity to
avoid that. The main body of the fixed function TCS is not
that interesting to precompile anyway, since we do it on
demand and it is very small.
v2: Use u_bit_scan64.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.
v2: - Use the correct register for SI.
- Add define for block size.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Experiments with framebuffer-no-attachments type draw calls have shown that
NULL exports stall terribly unless we ensure that export memory is allocated
by the SPI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
- use an enum
- use a unique slot number regardless of the shader stage
(the per-stage slots will go away for RW buffers)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
shader->config is not updated for compute kernels.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This fixes GS piglit failures after adding SI_PARAM_SHADER_BUFFERS,
which bumped NUM_USER_SGPRS and uncovered this bug on SI.
If this was fixed in LLVM, these workarounds wouldn't be needed.
LLVM would have to look at the calling convention to know how many SGPR
inputs are declared, and add VCC and the scratch wave offset (which is
enabled even if we spill SGPRs but not VGPRs, oh well).
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows compiling the main shader part as ES or LS.
If we get the correct hint, non-separable GLSL shaders no longer have to be
compiled as VS first, followed by LS or ES compiled on demand.
The result is that fewer shaders are compiled by piglit, but it doesn't
improve piglit running time.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This can increase perf for shaders that kill pixels (kill, alpha-test,
alpha-to-coverage).
v2: add comments
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Still disabled.
Only prologs & epilogs are compiled in draw calls, but each variant of those
is compiled only once per process.
VS is always compiled as hw VS.
TES is always compiled as hw VS.
LS and ES stages are always compiled on demand.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes FP16 conversion instructions for VI, which has 16-bit floats,
but not SI & CI, which can't disable denorms for those instructions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It was partly a state and partly emulated by shader code, but since we want
to do this in a fragment shader prolog, we need to put it into the shader
key, which will be used to generate the prolog.
This also removes the spi_ps_input states and moves the registers
to the PS state.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs
were offset by 1 if color_two_side was enabled, and not offset if it was not
enabled, which is a variation that's problematic if we want to have 1 variant
per shader and the variant doesn't care about color_two_side (that should be
handled by other bytecode attached at the beginning).
Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location
"num_inputs" if it's present. BCOLOR1 is next.
This also allows removing si_shader::nparam and
si_shader::ps_input_param_offset, which are useless now.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes it.
States which also need to be taken into account:
- SPI color formats - each down-conversion format supports only a limited set
of SPI formats
- whether MSAA resolving and logic op are enabled
These need special handling:
- blending
- disabled channels
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The addition of spi_shader_col_format killed all color outputs
in precompiled shaders.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
v2: also set the alpha func (trivial)
We now have an explicit parameter that contains the same information, and
this will allow us to get rid of is_gs_copy_shader in the si_shader struct.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Specifically, when the API switches from using a GS to not using a GS and then
back to using the same GS again, we do not have to re-send all the GS state,
but we do have to send VGT_GS_MODE. So make VGT_GS_MODE consistently be a part
of the VS state.
This fixes a rendering bug in Dolphin, but surely other applications are
affected as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93648
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When a fragment shader is used that has no outputs but does conditional
discard (KILL_IF), all fragments are killed without this patch.
By comparing various register settings, my conclusion is that the exec mask
is either not properly forwarded to the DB by NULL exports or ends up being
unused, at least when there is _only_ a NULL export (the ISA documentation
claims that NULL exports can be used to override a previously exported exec
mask).
Of the various approaches I have tried to work around the problem, this one
seems to be the least invasive one.
v2: take discard by alpha test into account as well
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93761
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
because not using SPI_SHADER_32_ABGR doubles fill rate.
We should also get optimal performance if alpha isn't needed or blending
isn't enabled.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This does change the behavior slightly:
If a shader writes COLOR[i] and that color buffer isn't bound,
the shader will export MRT_NULL instead and discard the IR tree that
calculates the output. The only exception is alpha-to-coverage, which
requires an alpha export.
v2: - update a comment about 16BPC
- account for MRTZ when when fixing alpha-test/kill
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I'm not sure about the consequences of this bug, but it's definitely
dangerous.
This applies to SI, CIK, VI.
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There will be 1 config per variant, which will be a union of configs
from {prolog, main, epilog}. For now, just add the structure.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This doesn't fix a known bug, but better safe than sorry.
Also, simplify the expression in si_shader.c.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
because the API pixel shader binary will not emulate alpha test one day,
so the KILL_ENABLE bit must be determined elsewhere.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow us to send shader debug info.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This changes the count slightly (because of si_generate_gs_copy_shader), but
this is only relevant for the driver-specific num-compilations query. It sets
the stage for the next commit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This reduces the shader key for ES.
Use a fixed attrib location based on (semantic name, index).
The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.
There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.
This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.
The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.
This is also a prerequisite for multithreaded shader compilation.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Otherwise we get:
warning: 'num_user_sgprs' may be used uninitialized in this function
...
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
It must be obtained from the VS.
The GS scenario A must be enabled for PrimID to be generated for the VS.
+ 4 piglits
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is analogous to r300_mark_atom_dirty() used by r300, and will
be used by later patches. For common radeon code, appropriate helper
is called through a function pointer.
No functional changes.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is the final piece for ARB_gpu_shader5,
The code is based on the r600 code from Glenn Kennard,
and myself.
While developing this, I'm not 100% sure of all the calculations
made in the GS registers, this is why the max_stream is worked
out there and used to limit the changes in registers. Otherwise
my initial attempts either regressed GS texelFetch tests
or primitive-id-restart. The current code has no regressions
in piglit.
This commit doesn't enable ARB_gpu_shader5, since that just
bumps the glsl level to 4.00, so I'll just do a separate patch
for 4.10.
v1.1: fix bug introduced in rebase.
v2: Address Marek's review comments,
remove my llvm stream code for simpler C,
move gsvs_ring and gs_next_vertex to arrays.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds to the common radeon streamout code, support
for multiple streams.
It updates radeonsi/r600 to set the enabled mask up.
v2: update for changes in previous patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is mainly needed for tessellation where a VS can be bound as VS, ES,
or LS, and TES (tess. evaluationshader) can be bound as VS or ES or neither.
Therefore we need the ability to move pointers to descriptors between
shaders arbitrarily.
The idea is that the context has a mapping from PIPE_SHADER_x to
SPI_SHADER_USER_DATA_x. After a shader is enabled or disabled,
si_shader_change_notify should be called to update this mapping accordingly.
There is a dirty flag for each shader pointer, but only one emit function
for all pointers in the whole context, whose code and logic is separated
from descriptors.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile
and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability
to remove mentions of the inline define.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
This eliminates the error prone logic in si_shader_vs recalculating this
value.
It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being
counted for VS exports. They need to be counted because they are passed
to the pixel shader as parameters as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This isn't pretty and I'd suggest it the pm4 interface builder
could be tweaked to do this more efficently, but I'd need
guidance on how that would look.
This seems to pass the few piglit tests I threw at it.
v2: handle passing layer/viewport index to fragment shader.
fix crash in blit changes,
add support to io_get_unique_index for layer/viewport index
update docs.
v3: avoid looking up viewport index and layer in es (Marek).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Without it, texcoords are mapped to GENERIC[0..7], PointCoord is mapped to
GENERIC[8], and user-defined varyings start from GENERIC[9]. Since texcoords
can only be used between VS and PS, and PointCoord is PS-only, it's silly to
always start from GENERIC[9] in all other shaders (such as LS, HS, ES, GS).
This adds support for TEXCOORD and PCOORD semantics. As a result, st/mesa
will use GENERIC[0] as a base for user-defined varyings, which should make
linking ES and GS as well as tessellation shaders at runtime easier.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes a crash in genymotion with several threads compiling shaders
concurrently.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746
Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
v2:
- Only emit write SPI_TMPRING_SIZE once per packet.
- Use context global scratch buffer.
v3:
- Patch shaders using WRITE_DATA packet instead of map/unmap.
- Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
VS_PARTIAL_FLUSH when patching shaders.
v4:
- Code cleanups.
- Remove unnecessary multiplies.
v5:
- Patch shaders in system memory and re-upload to vram.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This reverts commit 0543630d0b.
It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.
We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.
Sorry I didn't remember this when reviewing the reverted change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes incorrect rendering in Unreal Engine demos.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83510
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>