To de-clutter st_context.h.
Clean up remaining function prototypes in st_context.h.
The st_vp_uses_current_values() helper is only used in st_context.c
so move it there.
The st_get_active_states() function is only used in st_context.c so
remove its prototype in st_context.h
Reviewed-by: Neha Bhende <bhenden@vmware.com>
As stated in Vulkan spec:
"Resetting a descriptor pool recycles all of the resources from all
of the descriptor sets allocated from the descriptor pool back to
the descriptor pool, and the descriptor sets are implicitly freed."
This fixes dEQP-VK.api.descriptor_pool.*
Fixes: 14f6275c92 "anv/descriptor_set: add reference counting for..."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
loop {
...
vec1 1 ssa_12 = ilt ssa_225, ssa_11
vec1 1 ssa_17 = ilt ssa_226, ssa_1
vec1 1 ssa_18 = iand ssa_12, ssa_17
vec1 1 ssa_19 = inot ssa_18
if ssa_19 {
...
break
} else {
...
}
}
So in order to find the trip count we need to find the inverse of
ilt.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.
The new helper will also be used in a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).
For example:
for (int i = 0; i < imin(x, 4); i++)
...
We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.
The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.
We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.
A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:
GPU duration: 350 -> 325 microseconds
shader-db results radeonsi VEGA (NIR backend):
SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)
shader-db results i965 SKL:
total instructions in shared programs: 13098265 -> 13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21
total cycles in shared programs: 332039949 -> 331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9
vkpipeline-db results VEGA:
Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.
v2: check if the induction var is used to index more than a single
array and if so get the size of the smallest array.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.
Previously, shaders failed with:
SPIR-V parsing FAILED:
In file ../src/compiler/spirv/spirv_to_nir.c:1412
!is_shadow
632 bytes into the SPIR-V binary
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We may bind new Z/S buffers (which come via the framebuffer CSO,
triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled.
The next draw may enable Z or S writes (which come via the ZSA CSO,
triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update
our pin to have the write flag.
So, update pinning if either dirty flag changes. To clarify, pass
cso_zsa to the pinning function rather than pulling the random values
out of ice->state, which unfortunately have to exist for the resolve
code since iris_depth_stencil_alpha_state only exists in iris_state.c.
This avoids the code duplication that caused me to put things in the
wrong place in the previous commit. One used to have extra flushes,
but we moved those out so now these are identical and can be easily
shared.
Commit d6dd57d43c (iris: Add missing depth cache flushes) added the
depth/stencil flushes to the wrong place. I meant to add them to the
iris_upload_dirty_render_state code that emits the packets, but I
accidentally added them to the nearly identical looking code in
iris_restore_render_saved_bos. This meant we missed the actual flushing
at draw time, but instead did pointless flushing on the first draw in a
batch where things are already flushed anyway.
This commit moves them to iris_resolve.c, next to the depth prepares,
similar to what we do for color buffers. i965 does them elsewhere, but
I'm not sure why - this seems like the most consistent place.
Push this format to the pipe driver unchanged.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
1. If we switch the TCS for one with a different number of output
vertices, then the TES's gl_PatchVerticesIn value will change.
We need to re-upload in this case. For now, re-emit constants
whenever the TCS/TES are swapped out.
2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
the TCS info. Since it's a passthrough, we can just use the
primitive's patch count (like the TCS gl_PatchVerticesIn does).
Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels. This simplifies
the state upload code a bit.
Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which
despite its name is not a TGSI-specific capability, just lets
the state tracker know that it should generate a system value
for FACE.
This is needed if we want to run tgsi_to_nir on iris.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.
Save SPIR-V in tu_shader_module. Tranlation to NIR happens in
tu_shader_create, and compilation to binary code happens in
tu_shader_compile. Both will be called during pipeline creation.
Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and
tu_cs_end_sub_stream imply tu_cs_sanity_check. Callers are no
longer required to call them (but can still do if they choose to).