This is all currently immutable, but will be used to manage the
residency of the underlying D3D objects in a future commit.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
This will be important for residency in a future change, but also
is necessary for synchronize() to work correctly for TBOs.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
There's already a single command queue for the screen, meaning that
all commands are being serialized implicitly into that queue. There's
no need to have separate fences for parallel contexts when those
fences would all share the same underlying timeline.
This adds an explicit lock to expand the scope of the implicit screen
command queue ordering to include fence signals.
Each context still gets its own submit sequence, which is used for 1
purpose right now: A uniqueness check in the state manager to see
if states are coming from separate command lists, to apply promotion
and decay logic.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
The no-wait condition was wrong before. If the query was used in the
current batch (query->fence_value == context->fence_value), we'd
continue on with the operation instead of returning false. Then the
buffer map would see that the bo is referenced in the current batch,
and would flush and wait, even though we were asked not to wait.
This fixes the condition by simply using the (correct) buffer map
logic.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
Also, resource_is_busy needs to opportunistically retire batches, so apps can
spin on non-blocking resource maps and eventually succeed.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
Properly handling NaN adversely affects several hundred shaders in
shader-db (lots of Skia and a few others from various synthetic
benchmarks) and fossil-db (mostly Talos and some Doom 2016). Only apply
the NaN handling work-around when the shader demands it.
v2: Add comment explaining the 1.0*y_over_x. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 2098ae16c8 ("nir/builder: Move nir_atan and nir_atan2 from SPIR-V translator")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
frexp_sig of ±0, ±Inf, or NaN should just return the input unmodified.
frexp_exp of ±Inf or NaN is undefined, and frexp_exp of ±0 should return
the input unmodified. This seems to already work.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 23d30f4099 ("spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
NOTE: This commit needs "nir: All set-on-comparison opcodes can take all
float types" or regressions will occur in other Vulkan SPIR-V tests.
No shader-db changes on any Intel platform.
NOTE: This commit depends on "nir: All set-on-comparison opcodes can
take all float types".
v2: Fix handling 16-bit (and presumably 64-bit) values.
About 280 shaders in Talos are hurt by a few instructions, and a couple
shaders in Doom 2016 are hurt by a few instructions.
Tiger Lake
Instructions in all programs: 159893290 -> 159895026 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019260087 -> 7019254134 (-0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624235 -> 143625691 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440083238 -> 8440090702 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185495 -> 134186618 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366923 -> 8222365826 (-0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 1feeee9cf4 ("nir/spirv: Add initial support for GLSL 4.50 builtins")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
This (sort of) matches the behavior of nir_opt_algebraic. This ensures
that subnormal values are properly flushed to zero.
With the aid of "nir/search: Float sources of texture instructions are
float users" and "nir/search: Transitively apply is_only_used_as_float",
there would have been no shader-db regressions on Intel platforms.
However, those caused a significant increase in compile time. Since the
instruction regressions were so small, I just dropped those commits
rather than improve them.
All Haswell and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20125042 -> 20125094 (<.01%)
instructions in affected programs: 7184 -> 7236 (0.72%)
helped: 0
HURT: 32
HURT stats (abs) min: 1 max: 4 x̄: 1.62 x̃: 2
HURT stats (rel) min: 0.11% max: 1.49% x̄: 0.85% x̃: 0.78%
95% mean confidence interval for instructions value: 1.39 1.86
95% mean confidence interval for instructions %-change: 0.74% 0.96%
Instructions are HURT.
total cycles in shared programs: 862745586 -> 862746551 (<.01%)
cycles in affected programs: 109872 -> 110837 (0.88%)
helped: 12
HURT: 23
helped stats (abs) min: 2 max: 774 x̄: 90.83 x̃: 19
helped stats (rel) min: 0.07% max: 25.23% x̄: 3.06% x̃: 0.40%
HURT stats (abs) min: 2 max: 1106 x̄: 89.35 x̃: 12
HURT stats (rel) min: 0.08% max: 45.40% x̄: 3.01% x̃: 0.47%
95% mean confidence interval for cycles value: -60.09 115.23
95% mean confidence interval for cycles %-change: -2.21% 4.07%
Inconclusive result (value mean confidence interval includes 0).
All of the shaders hurt are in either UE4 shooter-game or shooter_demo.
Tiger Lake
Instructions in all programs: 159893213 -> 159893290 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019259514 -> 7019260087 (+0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624164 -> 143624235 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440082767 -> 8440083238 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185424 -> 134185495 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366529 -> 8222366923 (+0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: f5dd6dfe01 ("anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
Extend 4195a9450b so that the next poor fool doesn't come along and
say, "sge does the right thing for 16-bit sources, but slt gives a NIR
validation failure. What the deuce?"
NOTE: This commit is necessary to prevent regressions in GLSLstd450Step
tests of 16-bit sources at "spriv: Produce correct result for
GLSLstd450Step with NaN".
Fixes: 4195a9450b ("nir: sge operation is defined for floating-point types")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
When slots is 64 only the first bit was being set, instead of setting
all 64 bits of the variable, so for that case the function
get_variable_io_mask() always returned 0.
This behaviour caused variables that are being used both on producer and
consumer to be considered unused and thus being removed on
nir_remove_unused_io_vars().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14955>
There is a comment in nir_fold_16bit_sampler_conversions saying that these
are the same, but the code only checks for i2i16.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14893>
Expose most of the counters exposed by blob. By faking the value of
counters returned from kgsl I found the exact underlying counters and
constant coefficients being used.
Note, coefficients for counters that depend on time are NOT verified.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14323>
A big reason we had these fields was to help create a set of surface
states for a resource. That's largely being handled through other means
now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
Although a resource may support CCS with its original format, a texture
view of that resource may have a format that doesn't support
compression. Don't create CCS surface states for such texture views.
This change affects iris' behavior when running piglit's
arb_texture_view-rendering-formats_gles3 test on SKL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
Fast clear with the resource format instead. This is safe to do because
can_fast_clear_color ensures that the clear color generates the same
pixel with either the view format or the resource format.
On SKL, this prevents us from using an invalid surface state. This platform
doesn't support CCS_E with sRGB formats, but prior to this patch we allowed
fast-clearing with this combination. Piglit's fcc-write-after-clear test
can trigger this.
Fixes: 230952c210 ("iris: Don't support sRGB + Y_TILED_CCS on gen9")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
the idx param for LLVMBuildInsertElement is zero-indexed based on the
value of 'vector_length' (always 4), and the vector length is (obviously)
sized to 'vector_length', so this should be the member of the vec that is being
inserted, not the invocation index
cc: mesa-stable
fixes (zink, but only on my one machine):
KHR-GL46.tessellation_shader.single.max_patch_vertices
KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls
dEQP-GLES31.functional.tessellation.shader_input_output.barrier
dEQP-GLES31.functional.tessellation.shader_input_output.patch_vertices_5_in_10_out
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_isolines_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_isolines_point_mode_geometry_output_triangles
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_point_mode_geometry_output_lines
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_triangles_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_triangles_point_mode_geometry_output_lines
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14949>
Otherwise cmdbuffers from different queues can override the trace id
from each other, making for a very confusing hang report.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14868>
All of the opcodes in nir_opt_algebraic_late are the unsized (1-bit)
versions. If the lowering to int32 happens first, many of the
optimizations and lowerings won't happen.
Of particular importance is the lowering of fisfinite. If a shader
happens to contain fisfinite of an fp16 value, it will assert later
during compliation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 78b4e417d4 ("gallivm: handle fisfinite/fisnormal")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14942>
It was introduced for nir-to-tgsi, and I found that it was the wrong
approach. There's a reason nobody else does RA this way.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
Allocating NIR registers ends up being required for drivers like r600 and
nv30, which don't do their own allocation (except in some cases on r600
where sb is used).
Rather than add a NIR register liveness impl
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14158), switch
from NIR-based liveness to just doing the same channel-based liveness
logic that the NIR registers needed at the TGSI level. The actual
liveness code here basically comes straight out of
brw_vec4_live_variables.cpp.
Since we do the liveness in TGSI now, it also means we don't need to be
careful about not reading SSA values from later TGSI instructions (which
may be useful for doing some greedy instruction selection in generating
TGSI instructions).
i915g:
total instructions in shared programs: 400719 -> 380730 (-4.99%)
instructions in affected programs: 284760 -> 264771 (-7.02%)
total tex_indirect in shared programs: 12289 -> 12290 (<.01%)
tex_indirect in affected programs: 4 -> 5 (25.00%)
total temps in shared programs: 32172 -> 22086 (-31.35%)
temps in affected programs: 30647 -> 20561 (-32.91%)
LOST: 0
GAINED: 148
r300:
total instructions in shared programs: 1472463 -> 1459286 (-0.89%)
instructions in affected programs: 507009 -> 493832 (-2.60%)
total temps in shared programs: 212143 -> 201678 (-4.93%)
temps in affected programs: 78007 -> 67542 (-13.42%)
softpipe:
total temps in shared programs: 517071 -> 294387 (-43.07%)
temps in affected programs: 509324 -> 286640 (-43.72%)
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
To do register allocation well, we want to have a point before
ureg_insn_emit() to look at the liveness of the values and allocate them
to TGSI temporaries. In order to do that, we have to switch from
ureg_OPCODE() emitting TGSI tokens directly to a new ntt_OPCODE() that
stores the ureg args in a block structure.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
The function operated on a tgsi_full_instruction, but for code generation
in NIR-to-TGSI I want to reuse this logic using pieces of tgsi_ureg
structs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>