Correct the max thread number for DG2+ platforms according
to below bspec.
Ref: Bspec: 47202
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17506>
src/mesa/main includes are for Mesa's OpenGL implementation, and the
compiler is used in Vulkan drivers and other tools. We really only
needed one #define, which is that we offer 32 samplers. It probably
makes more sense to have our own defined limit for that rather than
importing a project-wide value which theoretically could be adjusted,
so swap MAX_SAMPLERS for a new BRW_MAX_SAMPLERS and call it a day.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
This pattern pops up a bunch and the semantics of nir_channels() aren't
very convenient much of the time. Let's add a nir_trim_vector() which
matches nir_pad_vector().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>
Otherwise the driver setting interacts with it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 939ddccb7a ("anv: Add support for depth bounds testing.")
Fixes: 1df871f8ff ("iris: Add support for depth bounds testing.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15763>
On Gen11+, we have a feature that requires us to shift binding table
offsets by 3. This adds a helper which gives the driver a hook to do
this if it so chooses.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
This introduces a new blorp_copy() path using the new XY_BLOCK_COPY_BLT
blitter command introduced on Tigerlake. Unlike the blitter commands of
old, this one is actually fast and worth using. Although it doesn't use
shaders like the rest of BLORP, we still can use some surface-munging
code from there, and BLORP also provides a nice place to put this which
is shared among the drivers.
To use the new path, set BLORP_BATCH_USE_BLITTER (much like Jordan's
recent BLORP_BATCH_USE_COMPUTE bit) and target the batch at the copy
engine.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This will be used as a performance hint for XY_BLOCK_COPY_BLT to
indicate whether the source/destination surfaces are (likely) in
device-local memory or system memory. We don't need to be precise
here - it's okay to set the fields to LOCAL even if a buffer has
been evicted out to system memory.
We should set this from Vulkan too, but I haven't yet. There isn't
a convenient anv_bo field like there is in iris...
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
Will be useful to figure out when blorp operations end.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13996>
Anv allows non-8x4-aligned depth buffer clears, but it has multisampled
HiZ disabled for BDW. iris allows multisampled HiZ on BDW, but disallows
non-8x4-aligned depth buffer clears.
Drop the unused optimization for non-8x4-aligned clears of multisampled
surfaces on BDW and use this opportunity to use some PRM text in the
code comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14091>
The state cache invalidation shouldn't be necessary on recent
platforms. On ICL it *seems* to be required to get the hardware to
pick up an updated indirect clear color, so this change is only
applied to TGL platforms and later for the moment.
On some DG2 configs this seems to improve SynMark2/OglDrvRes by 16.0%
±0.1%, n=8.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
Actually, no, there's no need to do anything, just update some
comments for the record. An earlier revision of this change that
implemented the workaround text to the letter required no less than 8
new PIPE_CONTROLs throughout the tree. However Felix Degrood noticed
that the cost of some of the PIPE_CONTROLs was showing up in workloads
like Shadow of the Tomb Raider. The Windows driver wasn't emitting
many of those pipe controls, contrary to the W/A instructions, so we
engaged in a back and forth with the hardware team, who concluded that
the original suggested workaround was unnecessarily strict, and the
Windows driver's behavior acceptable. It turns out that Wa_1408224581
we had already implemented for TGL is roughly equivalent to the
Windows behavior, so no need to do anything new after all.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14278>
The alignment and scale down values have changed on this platform.
To support drivers that won't use a CCS surface on this platform, this
patch computes the CCS fast clear rectangle using the main surface.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
According to Bspec 2424, "Render Target Resolve":
The Resolve Rectangle size is same as Clear Rectangle size from SKL+.
Use get_fast_clear_rect in blorp_ccs_resolve for SKL+.
Note that the Bspec differs from Vol7 of the Sky Lake PRM, which only
specifies aligning by the scaledown factors.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
We don't support typed image writes for multisampling, so we can't
handle multisampled destinations. We also usually handle MSAA by
running the fragment shader per-sample, which we aren't accounting
for in our compute shaders, so we can't handle MSAA sources either.
We could do both of these things if we really wanted to, but we don't.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13524>
When we're doing a stencil blit via a fragment shader, we can avoid
W-tiling shenanigans by using the stencil write hardware on Skylake
and later.
Of course, the compute engine doesn't have stencil fragment writes,
so it can't do that. Just fall back to the detiling shenanigans.
Caught by Piglit's arb_copy_image-formats when forcing iris to use
BLOCS for resource_copy_region on Icelake.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13524>
When dispatching compute shaders to do a blit, our destination rectangle
may not line up perfectly with the workgroup size. For example, we may
round the left x0 coordinate down to a multiple of the workgroup width,
and the right x1 coordinate up to the next multiple of the workgroup
width. Similarly for y0/y1 and workgroup height. This means that we
may dispatch additional invocations which should not actually do any
blitting. We need to set key->uses_kill to bounds check and drop those.
Caught by Piglit's arb_copy_image-simple when forcing iris to perform
resource_copy_region via BLOCS and running with INTEL_DEBUG=norbc on
Icelake.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13524>
We don't current enable post sync operations, but it is probably
better to set them to "internal" MOCS than to remove the non-zero
checking for this genxml field.
Reworks:
* Fix COMPUTE_WALKER in cmd_buffer_trace_rays (s-b Jason)
Fixes: 7b78b2fcac ("intel/genxml: Assert that all MOCS fields are non-zero on Gfx7+")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13624>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is disabled constant buffers, where MOCS shouldn't matter, as
there's no actual buffer to be cached.
That said, it should be harmless to set MOCS for these null constant
buffers; we can just assume a generic MOCS for internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
isl now uses info->mocs regardless of whether there's any actual
depth/stencil/HiZ buffers involved, so pass it a legitimate one,
rather than zero. We just assume a generic internal MOCS when we
have entirely NULL surfaces.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
We'd like to add safeguards against accidental use of MOCS 0 (uncached),
which can have large performance implications. One case where we use
MOCS of 0 is SURFTYPE_NULL surfaces, where MOCS really shouldn't matter,
as there's no actual surface to be cached.
That said, it should be harmless to set MOCS for these null surfaces;
we can just assume a generic MOCS for internal buffers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13480>
warning: ‘bind_offset’ may be used uninitialized in this function
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13190>
Reworks:
* Use BLORP_BATCH_USE_COMPUTE flag rather than compute param to
blorp_copy (s-b Jason)
* Squash "intel/blorp: Set shader_pipeline for compute"
* Squash "intel/blorp: Add blorp_copy_supports_compute function"
* Squash "intel: Support compute for image/buffer copy if INTEL_DEBUG=blocs
is set"
* Squash "intel/blorp: Support compute for some blit operations"
* Use nir_image_store (s-b Jason)
* Use nir_push_if (s-b Jason)
* Require gfx12 for ccs in blorp_copy_supports_compute (s-b Jason)
* Add nir_pop_if (s-b Ken)
* Fix aux_usage check on gfx12 blorp_copy_supports_compute (s-b Ken)
* Use blorp_set_cs_dims (s-b Jason)
* Use dim=2d with array=true for nir_image_store (s-b Jason, Francisco)
* Restructure gen checks in blorp_copy_supports_compute (s-b Ken)
* Use nir_load_global_invocation_id (s-b Jason)
* Fix inefficient calculation of store_pos (s-b Jason)
* Use bounds_if being NULL/non-NULL for nir_pop_if (s-b Jason)
* discard => bounds (s-b Ken)
* Re-add ISL_AUX_USAGE_CCS_E in *_supports_compute (s-b Sagar)
* Skip duplicated in_bounds calculation (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Based the blorp_params, blorp_get_cs_local_y returns a recommended
local_y size for a compute shader.
blorp_set_cs_dims sets the compute program dims based on a given
local_y size.
Reworks:
* Add blorp_set_cs_dims (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Reworks:
* Don't pack params, just memcpy param struct (s-b Jason)
* Old subject: "intel/blorp: Emit compute program if
params.cs_prog_data is set"
* Various cleanups of push-const size/alignment (s-b Jason)
* Fix subslice count by moving to devinfo (s-b Ken)
* Simplify cw.InterfaceDescriptor code (s-b Ken)
* Drop some comments from i965 (s-b Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Render will use blorp_setup_binding_table and blorp_emit_btp, but
compute will only use blorp_setup_binding_table.
Rework:
* Use blorp_setup_binding_table, blorp_emit_btp (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
The compute path will use blorp_emit_sampler_state, whereas the render
path will use blorp_emit_sampler_state_ps.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>