gfx11 passes scratch base address using
SPI_GFX_SCRATCH_BASE_LO and _HI registers. Make the
code changes to support the same.
v5: remove type cast from 64bit to 32bit (Marek Olšák)
v4: combine scratch_memory and scratch_state atom (Marek Olšák)
v3: skip shader relocs for gfx11
v2: make atom for scratch_memory (Indrajit)
Signed-off-by: Yogesh mohan marimuthu <yogesh.mohanmarimuthu@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16328>
use multiple contexts and submit in a round robin scheme to make
use of all the available jpeg engines simultaneously. During mjpeg
decode context need not be same across frames as they are discrete
jpeg images.
V2: number of ctx to be equal to number of engines and fix indent (Leo)
V3: decide ctx count in create_decoder, don't add a video param (Boyuan)
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16355>
The only interesting ones here were LOWER_IF_THRESHOLD (which previously
had connected to some lowering in GLSL that was broken in the face of side
effects), and FMA (which turned GLSL IR's fma() into TGSI_OPCODE_FMA
instead of MAD).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8044>
For some days, the CI was bypassing LAVA and bare-metal jobs due to an
issue in the init-stage2.sh script. After the fix some tests
crashed/failed. This commit updates the expectations for them.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16325>
And set more optimal compute block sizes.
The compute copy is required to preserve NaNs, so this fixes a lot of
AMD_TEST=copyimage cases.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16215>
This function modifies the NIR shader, so when using NIR_DEBUG=print and
nir_viewer all the variants are displayed using the original name.
With this change, we get the original shader (eg: GLSL3), and then each
variant gets its own name (GLSL3-xxxxxxxx) - and is displayed in its
own tab in nir_viewer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16247>
981bd8cbe2 moved outputs removing handling to NIR, but instead of
applying it only to the last stage before the FS this now applies
it to both the GS and the VS.
This commit fixes this by clearing the kill_outputs field for
the VS when using a ES-GS shader.
Fixes: 981bd8cbe2 ("radeonsi: apply key.ge.opt.kill_{outputs,pointsize,clipdistance} in NIR")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16249>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
Convert all SNORM formats to SINT.
This fixes SNORM blits for radeonsi.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16132>
JPEG does not require create and destroy codec messages.
It is not firmware based, so these messages are redundant.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16160>
The addition of the "compute" parameter is for a future change.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15966>
This depends on the offset computation fix from:
"ac/llvm: remove inst_offset parameter from ac_build_buffer_store_dword"
v2: The instruction type is changed to MUBUF, which requires us to clear
DATA_FORMAT with ADD_TID_ENABLE.
Reviewed-by: Mihai Preda <mhpreda@gmail.com> (v1)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15966>
There was a bug that inst_offset was added to soffset in one codepath and
to voffset in all other codepaths. The correct behavior is to add it
to voffset.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15966>
This will help me see all places where we use "info", which will
be moved from si_shader_selector to shader variants.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14414>
This stops pipe_stream_output_info from create_*s_state context functions
because NIR contains everything and can do more advanced shader linking
this way.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14414>
This may be needed by ACO, but it doesn't do anything for LLVM yet other
than making the initial LLVM IR smaller.
It will be needed by a future commit, which rewrites ac_optimize_vs_outputs
in NIR, which relies on NIR matching the shader key.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14414>
Other stages don't have indirect indexing, so it's always const.
Doing it here should also remove dead load_const instructions.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14414>
We don't expect the GS copy shader to ever use the scratch buffer,
so we just don't compile the shader, but the problem is we set ok to
true anyway.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15965>
The code was compiling monolithic PS if a shader output didn't
have a color buffer, but dual src blending never has a color buffer
for mrt1.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15965>
Tested on:
- navi10: L0 cache counter doesn't work (always 0)
- sienna_cichlid: L0 doesn't work (always 0) and L1 isn't visible in RGP
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15646>
util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:
...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.
As a consequence, this unit test:
...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.
This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.
Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called. This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.
Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
it's used by 3 different drivers, so it shouldn't be in radeonsi
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15907>
Use a unified si_create_copy_image_cs() to generate both the 1D and 2D
copy shaders; the shaders themselves remain separate.
Also:
- add a helper deref_ssa() for nir deref
- add a helper set_work_size() for setting workgroup and grid sizes
- pass si_context (instead of pipe_context) to the copy_image generator
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15268>
The piglit suite was run on vega20, 10 times, on a 'debugoptimized' mesa build
at commit 582e7f15 . The inconsistent failures were added to the flakes file.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15487>
Replaced the existing internal TGSI compute shader, which clears
a read-modify-write buffer, with its NIR equivalent. The disassembly
shader generated by the new NIR variant is identical to the previous
implementation. These changes remove the additional conversion step
from TGSI to NIR for the shader at runtime. Tested on a Navi 23 card.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15356>
This cap is no longer TGSI specific, so let's rename it to reflect
reality.
Because the name got a bit vague when removing the TGSI-bits, let's add
some more details to the name.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15316>
This isn't specific to TGSI, so let's update the name to reflect
reality.
Because the name of the opcode was TGSI specific, let's pick a new one,
based on the naming of the PIPE_CAP_TEXTURE_QUERY_LOD cap.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15316>
This cap is no longer TGSI-specific, so let's update the name to reflect
reality.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15316>
These aren't spiecic to TGSI any more, so let's rename them to reflect
reality.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15316>
These aren't specific to TGSI, so let's rename them to reflect the
reality.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15316>
Similar to the previous commits, these aren't TGSI specific, so let's
drop TGSI from their name.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15316>
This cap is no longer specific to TGSI, so let's rename it and update
the documentation to reflect that.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15316>
This cap no longer has anything to do with TGSI, as the lowering happens
on GLSL IR, and applies just as much to NIR drivers. So let's rename
this cap and update the docs to reflect the current situation.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15316>
Since neither PKT3_LOAD_SH_REG_INDEX nor PKT3_COPY_DATA work with compute
queues on GFX7-10, we have to load the dispatch size from memory in the
shader.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15064>
Otherwise this causes trouble with unitialized memory, eg with:
struct si_transfer {
struct threaded_transfer b;
struct si_resource *staging;
};
'staging' will not be initialized and this causes #6109.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6109
Cc: mesa-stable
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15298>
radeonsi-stoney-gl:amd64 job fails due to random crashes of some
'dEQP-GLES3.functional.buffer.map.write.explicit_flush.*' tests.
Fix the pipeline by adding them to 'radeonsi-stoney-flakes.txt'.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15238>
Instead of allocating cpu_storage in threaded_resource_init, defer the
allocation to first use (in tc_buffer_map).
This avoids needless memory allocation if tc_buffer_disable_cpu_storage is
called before tc_buffer_map.
map_buffer_alignment is stored and serves as a "can cpu_storage be used" flag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15074>
To get the same argument in multi and single gpu situation.
Fixes: 21b0153833 ("radeonsi/tests: print PCI-id of GPU device under test")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15240>
Use ROUND_TO_EVEN instead of TRUNCATE; this matches what pal and radv do.
This fixes the spec@ext_framebuffer_multisample@turn-on-off tests.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15240>
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL 4.50 spec:
"For some blocks declared as arrays, the location can only be applied
at the block level: When a block is declared as an array where
additional locations are needed for each member for each block array
element, it is a compile-time error to specify locations on the block
members. That is, when locations would be under specified by applying
them on block members, they are not allowed on block members. For
arrayed interfaces (those generally having an extra level of
arrayness due to interface expansion), the outer array is stripped
before applying this rule"
From Section 1.2.1 (Changes from Revision 6 of GLSL Version) of the GLSL 4.50 spec:
"Private Bug 15678: Don’t allow location = on block members where
the block needs an array of locations"
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL ES 3.20 spec
"If an input is declared as an array of blocks, excluding per-vertex-arrays
as required for tessellation, it is an error to declare a member of
the block with a location qualifier"
From Section 1.1.3 (Changes from GLSL ES 3.2 revision 3) of the GLSL ES 3.20 spec:
"Arrayed blocks cannot have layout location qualifiers on members"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11522>
The problem is that dirty_states must be 0 for any state that is NULL
in "queued". This code was flagging dirty_states for such states because
it was only looking at "emitted". It should have been looking at "queued".
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15209>
moved from radeonsi without the vectorization, which won't be needed for
now. We will lower IO in st/mesa instead of radeonsi to get the transform
feedback info into store instructions.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
This commit adds support for Vulkan backend on a630_skqp job.
= Needed changes
- Needed to install libvulkan-dev package on system
- Refactored the way the available skqp reports are printed
tested in development builds with skia tools
Piglit expectations had to be updated in various drivers due to !14750 not
having bumped the tags when it tried to uprev.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
Add a global-level variable that allows disabling all jobs that would
have gone to the Collabora lab, to be used in case of outages.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15150>
glWaitSemaphoreEXT triggers si_flush_resource callback
on pipe buffer resources, which may cause segmentation fault.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15098>
It was mistakenly added to indicate it's for a User-Mode Driver,
but all defined registers in Mesa are.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15098>
This causes the flushed_depth_texture is allocated without
multi sample. So the blit will cause VM fault.
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14990>
Currently this just has wait, but in order to get the right answer
for vulkan partial, lavapipe/llvmpipe need to pass a partial flag
through here in the future.
This just changes the API so that's possible.
v2: use an enum (zmike)
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15009>
The root cause is unknown but PAL always update IA_MULTI_VGT_PARAM
whenever primitive type change.
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Singed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14944>
8791e831b1 marked imported prime buffers as uncached (useful when prime
buffer is allocated by the display GPU), but they should also be created
as uncached (useful when allocated by the render GPU).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14615>
protected buffer allocations need to be made if the context is secure
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14848>
Without this change LLVM 12 hits this error:
"""
LLVM ERROR: Error while trying to spill SGPR0_SGPR1 from class SReg_64:
Cannot scavenge register without an emergency spill slot!
"""
when running glcts KHR-GL46.arrays_of_arrays_gl.AtomicUsage test.
Fixes: 9ff086052a ("radeonsi: unroll loops of up to 128 iterations")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14848>
While it's not the primary interface to interpreting trace job failures,
it was set in all the traces jobs it looks like and it's low cost anyway.
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14604>
To avoid dragging gl.h into places it has no business being,
defined tessellation primitive mode to an enum.
This has a lot of fallout all over the place.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
because si_get_nir_shader runs NIR passes and some of them can introduce
new loads.
Fixes: 3fb77ef2e0 - radeonsi: do opt_large_constants & lower_indirect_derefs after uniform inlining
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14528>
Instead of using actual sample count as parameter, we only use a bool
to indicate if the target is multi sample. This is because we don't
know the sample count when glGetInternalformativ() case.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14362>
Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
This will allow further optimizations for shader variants that change
GS outputs (affecting the copy shader), and this is mainly about sharing
optimizations with NGG instead of having a totally separate codepath for
legacy GS.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14266>
The samplemask VGPR that we had to pass to the epilog increased VGPR usage
by 1 for all shaders. Do it in the main function by using the mono key
structure, which causes on-demand compilation and stall, but we'll save
the VGPR.
57794 shaders in 35145 tests
Totals:
SGPRS: 2715856 -> 2716272 (0.02 %)
VGPRS: 1776168 -> 1718432 (-3.25 %)
Spilled SGPRs: 3704 -> 3630 (-2.00 %)
Spilled VGPRs: 1727 -> 1733 (0.35 %)
Private memory VGPRs: 256 -> 256 (0.00 %)
Scratch size: 2008 -> 2016 (0.40 %) dwords per thread
Code Size: 61429584 -> 61393288 (-0.06 %) bytes
Max Waves: 838645 -> 840484 (0.22 %)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14266>
It was used when LLVM didn't support vec3 and we had to pass vec4
with num_channels=3. We no longer need to do that.
This also removes the vec3 splitting or conversion to vec4 in callers.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14266>