so that most shaders can get lower VGPR usage thanks to lazy input loading.
I think this is a more accurate constraint that prevents the black transitions
in Witcher 2.
Affected shaders (7758):
Max Waves: 57437 -> 58231 (1.38 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It's also possible to monitor them via performance counters but
the hardware can only use two counters simultaneously. It seems
easier to re-use the existing code which reads from MMIO instead
of writing a multi-pass approach.
v2: - add new lines after ':'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will allow to expose more queries in order to know which
blocks are busy/idle.
v2: - add new lines after ':'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some query results struct contents are declared as cache line aligned.
Use aligned malloc, and align the whole struct, to be safe.
Fixes crash when compiling with clang.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CalculateProcessorTopology tries to figure out system topology by
parsing /proc/cpuinfo to determine the number of threads, cores, and
NUMA nodes. There are some architectures where the "physical id" begins
with 1 rather than 0, which was creating and empty "0" node and causing a
crash in CreateThreadPool.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97102
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
CC: <mesa-stable@lists.freedesktop.org>
This allows eglCreateImageKHR to access P010 surfaces created by vaapi
Signed-off-by: Rainer Hochecker <fernetmenta@online.de>
Acked-by: Ben Widawky <ben@bwidawsk.net>
This fixes "st/va: delay calling begin_frame until we have all parameters".
v2: call begin frame after decoder (re)creation as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
We will use it for DDIV.
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
It seems clear that trying to multiply two pairs of doubles would result
in the temporary register getting overwritten by the second pair. So
make the code more explicit.
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
There are some line wrapping violations here but those lines will get
deleted in the following patch.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Now that the i965 backend doesn't depend on this field we can
make it more generic and short circuit a bunch of code paths.
The new field will be used in a following patch for another
clean-up.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This makes much more sense and should be more performant in some
critical paths such as SSO validation which is called at draw time.
Previously the CurrentProgram array could have contained multiple
pointers to the same struct which was confusing and we would often
need to fish out the information we were really after from the
gl_program anyway.
Also it was error prone to depend on the _LinkedShader array for
programs in current use because a failed linking attempt will lose
the infomation about the current program in use which is still
valid.
V2: fix validate_io() to compare linked_stages rather than the
consumer and producer to decide if we are looking at inward
facing shader interfaces which don't need validation.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
To avoid build regressions the following 2 patches were squashed in to
this commit:
mesa/meta: rewrite _mesa_shader_program_use() and _mesa_program_use()
These are rewritten to do what the function name suggests, that is
_mesa_shader_program_use() sets the use of all stage and
_mesa_program_use() sets the use of a single stage.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
mesa: update active relinked program
This likely fixes a subroutine bug were
_mesa_shader_program_init_subroutine_defaults() would never have been
called for the relinked program as we previously just set
_NEW_PROGRAM as dirty and never called the _mesa_use* functions when
linking.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
What a3xx docs call IJPERSPCENTERREGID.. the xy coord passed into
bary.f. We were incorrectly setting both this and gl_FragCoord.xy to
the same register resulting in all sorts of hilarity.
Fixes stk, vdrift, 0ad, probably a bunch others.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Note spritelist (POINTLIST_PSIZE) seems not to be a thing anymore on
a5xx.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
The previous code always compared integers as 64-bit. Due to variations
in sign-extension in the code generated by nir_opt_algebraic.py, this
meant that nir_search doesn't always do what you want. Instead, 32-bit
values should be matched as 32-bit and 64-bit values should be matched
as 64-bit. While we're here we unify the unsigned and signed paths.
Now that we're using the right bit size, they should be the same since
the only difference we had before was sign extension.
This gets the UE4 bitfield_extract optimization working again. It had
stopped working due to the constant 0xff00ff00 getting sign-extended
when it shouldn't have.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
In order to handle CCS_E, we stomp the image format to a UINT format and
then do some bitcasting logic in the shader. This works fine since SKL
render compression only considers the channel layout of the format and
not the format itself. In order for this to work on images that have
been fast-cleared, we need to also convert the clear color so that, when
interpreted as UINT, it provides the same bit value as it would have in
the original format. This fixes a bunch of OpenGL ES CTS tests for
copy_image when we start using CCS more aggressively.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
basetsd.h on Windows defines INT64 and UINT64 typedefs which conflict
with these. Append "_TOK" to avoid conflicts.
Should fix the Windows build.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gen8 adds Q/UQ types. We attempted to change the types back to DF in the
generator (commit c95380c40), but an assertion added in the FP64 series
(commit e481dcc3) triggers before that code has a chance to execute.
In fact, using Q/UQ in the IR and then changing to DF in the generator
would not work in the presence of source modifiers, etc.
Fixes: d6fcede6 ("i965: Return Q and UQ types for int64 and uint64")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is basically the same as happens for doubles.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Integer comparison functions (e.g., nir_op_ilt) are handled in the next
commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2 (idr): Make the "from" type in a cast unsized. This reduces the
number of required cast operations at the expensive slightly more
complex code. However, this will be a dramatic improvement when other
sized integer types are added. Suggested by Connor.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Fixup assertion in brw_reg_type_to_hw_type to allow
BRW_REGISTER_TYPE_{UQ,Q} on Gen8+.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>