This file is created in 2a0d45ae6c but
addition to android makefiles was omitted. It breaks the build with
missing references which are defined in this file.
List the file in ir3_SOURCES to make the build succeed.
Signed-off-by: Marijn Suijten <marijns95@gmail.com>
This fixes some crashes with dEQP-VK.descriptor_indexing.* when
read_first_invocation has its source from a descriptor.
Most of these tests still fail because of an LLVM bug (they work
with ACO).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.
v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)
Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.
v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask
v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler
v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
0.49.0 can compile most of mesa with ICC or ICL, but not SWR without
additional workarounds in our meson.build files. Bumping patch version
is easier and shouldn't be a big burden anyway, especially to cover a
niche compiler. The check originally only covered ICC, but now covers
ICL as well.
Fixes: 3740ffb59c
("meson: add switches for SWR with MSVC")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1937
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Due to a bug in GFX10 hardware, s_nop instructions must be added
if a branch is at 0x3f. We already do this, but forgot to also update
the constant addresses that come after this instruction.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Currently if you have an SMEM store followed by an SMEM load that
loads the same location as was written, it won't work because the
store isn't finished before the load is executed. This is NOT
mitigated by an s_nop instruction on GFX10.
Since we currently don't have proper alias analysis, this commit adds
a workaround which will insert an s_waitcnt lgkmcnt(0) before each
SSBO load if they follow a store. We should further refine this in
the future when we can make sure to only add the wait when we load the
same thing as has been stored.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The current implementation does not synchronize on BO readiness when
DISCARD_WHOLE_RES flag is set, which can lead to misbehaviours when the
resource being updated is being used by one of the pending or already
flushed batches.
Adding unconditional BO synchronization would do the trick, but we can
sometimes optimize this path by re-allocating a new BO instead of
waiting for the existing one to be ready.
Reported-by: Daniel Stone <daniels@collabora.com>
Reported-by: Heinrich Fink <heinrich.fink@daqri.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
According to the OES_geometry_shader spec, section Dependencies:
"OpenGL ES 3.1 and OpenGL ES Shading Language 3.10
are required."
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We currently doesn't maintain it correctly and the buffer gets leaked if
surface is destroyed before calling swapping buffers.
From Android frameworks/native/libs/nativewindow/include/system/window.h:
The window holds a reference to the buffer between dequeueBuffer and
either queueBuffer or cancelBuffer, so clients only need their own
reference if they might use the buffer after queueing or canceling it.
v2: Remove our own reference.
Fixes: 0212db3504 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")
Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
If a pipeline has both graphics and compute, descriptors are same.
While we are at it, use queue->device for simplicity.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
To make sure a trace file is generated in case the driver crashes
during the hang report generation (which happens sometimes).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This information has never been useful. All descriptors are
already dumped with colors etc, and it's more useful.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Disable 16-bit features because fp16 isn't exposed on these chips.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A bunch of blend tests fixed on T760. A single blend test regressed on
both T760/T860 but I am unable to reproduce locally so am just
documenting the regression and moving on.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like to eliminate not just entire dead instructions, but also
dead components, which increases scheduler flexibility (since some
vector instructions can become scalar after eliminating dead
components). This also will allow better RA in the future.
Results are meh.
total instructions in shared programs: 3453 -> 3451 (-0.06%)
instructions in affected programs: 60 -> 58 (-3.33%)
helped: 2
HURT: 0
total bundles in shared programs: 1826 -> 1824 (-0.11%)
bundles in affected programs: 33 -> 31 (-6.06%)
helped: 2
HURT: 0
total quadwords in shared programs: 3144 -> 3144 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0
total registers in shared programs: 321 -> 321 (0.00%)
registers in affected programs: 45 -> 45 (0.00%)
helped: 11
HURT: 11
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 16.67% max: 50.00% x̄: 39.70% x̃: 50.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for registers value: -0.45 0.45
95% mean confidence interval for registers %-change: -1.87% 62.18%
Inconclusive result (value mean confidence interval includes 0).
total threads in shared programs: 445 -> 447 (0.45%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows for vec16 dependencies in the scheduler, not that we have
any yet (thankfully).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have notion of byte masks, liveness tracking can be updated
to reflect this extra granularity without loss of correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Read component masks don't have a particular type associated, since the
type of the ALU operation may not match the type of the operands in
question. So let's generate byte masks instead, and update the rest of
the compiler to use byte masks when analyzing reads.
Preparation for mixed types.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There are essentially two formats of masks in play beginning with this
commit: masks per-channel and masks per-byte. The former make sense
within a given fixed-size instruction; the latter are
typesize-independent. It turns out you need the latter to meaningfully
manipulate instructions containing multiple sizes (which is quite
possible with ALU operations).
Similarly, we have mir_srcsize. We calculate the size of the source by
analyzing the size of the instruction itself and stepping down if there
is a half-modifier.
Finally, we have mir_round_bytemask_down, for when we want to take a
byte mask and "round it down" to a given component size, so that we can
use it as a component mask.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to encode properties about the load/store ops like we
do for ALU ops. We include now properties about whether we have a store,
and if there are special cases on the load/store op. We also tag each
instruction by its natural size... this is probably not totally right,
but it's a start.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The trick is realizing even with a destination override, the masks are encoded in the same mode as the
instruction itself, rather than stepping down. The override means that
the smaller type is used, but the mask is parsed as if it were the
higher type. Overriding down is down by printed by blinding doing this. Overriding up can be thought of as printing in the upper size, but shifting the alphabet to use the upper half, i.e. shifting xyzw to become abcd.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Add some comments explaining what's going on in a more natural flow in
order to solve the actual bug.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 2d914ebe81 ("pan/midgard: Fix memory corruption in register spilling")
This allows a write to proceed to an uninitialized part of a buffer
even when the GPU is using the previously-initialized portions.
Such a situation can be triggered with the following API usage example:
glBufferSubData(..., offset, size, data1);
glDrawArrays(...);
// append new vertex data
glBufferSubData(..., offset+size, size, data2);
glDrawArrays(...);
Same is done for freedreno, nouveau and radeon.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Store the changed usage in the newly created transfer object.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Fixes: 1194afdfe3 ("etnaviv: rework the stream flush to always go through the context flush")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Fixes: e5cc66dfad ("etnaviv: Rework locking")
Fixes: 1456aa61cc ("etnaviv: Rework resource status tracking")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
This is the layout used in the GL API, and maps directly to PIPE
formats with no endianness trickery. As with the LA change, this
fixes big-endian fetching from texbos. Also cleans up some endian
shenanigans in shader images.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that Mesa is also using an array format for LA, nothing was using
these. (And, clearly, no HW driver had exposed them).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The array format is what the GL API wants (fixing texbos on
big-endian), and matches directly to gallium's corresponding array
format. The only driver exposing A8L8 was radeon/r200 in big-endian,
where the HW's underlying format was trying to read as array and we
needed to flip things around to make our packed format come out right
(note that while the radeon format tables had both AL and LA,
ChooseTextureFormat would only pick one of them based on endianness).
v2: Don't make r200/radeon use endian swaps.
v3: Rebase on dropping the r200 _be/_le format table removal patch
v4: reword commit message to explain why we can drop both formats
from radeon.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
The array format is what the GL API wants (and we made a mistake in
the format returned for texbos on big-endian!), and it's exactly what
the gallium-side PIPE_FORMAT_L16A16 is. The only downside is that
dri_util tries to fall back to sampling RG16 using LA16, which doesn't
have a match for big-endian any more. No HW drivers supported A16L16
anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The first arg to OUT_BATCH_RELOC is ignored, we actually wanted these
in the third arg. They're always 0 so far, so it didn't matter.
v2: Reword commit message that I don't end up using the tile bits, but
keep the commit as a cleanup anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
No matter what, we deref the texFormat from the table, except for a
mistake in cpp=4 where we pulled a 0 out of the table either way.
v2: Rebase on dropping r200 table deduplication patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)