v2: use tc.stream_uploader in si buffer_transfer_map if not called from
the driver thread
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This is a no-op for drivers supporting persistent mappings.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This makes us properly handle gl_ClipDistance and gl_CullDistance.
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
We needed to better handle cases where a chunk of a variable starts at
some non-zero location_frac and rolls over into the next slot but may
not be more than 4 dwords. For example, if gl_CullDistance is an array
of 3 things and has location_frac = 2, it will span across two vec4s but
is not, itself, bigger than a vec4. If you ignore the clip/cull special
case, it's not allowed to happen for anything else because the only
things that can span more than one slot is dvec3 and dvec4 and they're
both bigger than a vec4. The current code uses this attrib_slot thing
where we count attribute slots and iterate over them. However, that
doesn't work in the case above because gl_CullDistance will have an
attrib_slot count of 1 even though it does span two slots. We could fix
this by adjusting attrib_slot but we already have comp_mask and it's
easier to just handle it that way.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Instead of going to all the work of to combine them into one array, just
make two arrays and use location_frac to colocate them within CLIP0.
Then the back-end can sort things out and stack them on top of each
other. Thanks to ef99f4c8, we also don't need to set compact anymore.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use the 'UNK31' bit (which should probably be called 'BUFFER') for
samplerBuffer case, which increases the size of supported buffer
texture beyond 2^15 elements.
Also need to fix the 2nd coord injected to handle the tex instructions
that take integer coords.
Fixes dEQP-GLES31.functional.texture.texture_buffer.render.as_fragment_texture.buffer_size_131071
and similar
Signed-off-by: Rob Clark <robdclark@gmail.com>
... instead of isam. It seems like when using isam, plus atomics, we
can have the problem of old data being in the texture cache. Plus this
way we don't have to load a component at a time.
Note that blob still seems to use isam in some cases. I suppose it might
be preferable in the case of loading a single component, when atomics
are not in the picture (or that the ssbo does not need to otherwise be
coherent).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Resync disasm and instr header from envytools, and add ldib encoding.
This replaces an opcode from a3xx which was never seen in practice,
since that seemed easier than dealing with the same opc # meaning a
different thing on a6xx. (Not really sure if 'sti' was actually a
real thing, I think it was only seen in fuzzing.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes
dEQP-GLES3.functional.shaders.indexing.varying_array.vec3_dynamic_write_dynamic_loop_read
regression.
Fixes: c1a27ba9ba freedreno/ir3: HIGH reg w/a for a6xx
Signed-off-by: Rob Clark <robdclark@gmail.com>
The variant will be NULL if RA failed. Which isn't ideal, but at least
lets not segfault and bring down the rest of the dEQP run with us.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The wrmask is handled in regmask_get()/regmask_set(), but it wasn't
being propagated from SSA src to dst. So for example, an SSBO read
value that is passed in as src2.y component to atomic op, wasn't
getting the (sy) flag set. Causing lots of fail.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Even in a very basic shader this reduces the time spent in
nir_copy_prop() by ~17%.
No shader-db changes for radeonsi NIR or i965.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Needed for https://gitlab.freedesktop.org/mesa/mesa/merge_requests/248 .
That MR keeps the clip and cull arrays split.
So we have to handle
- compact arrays with location_frac != 0
- VARYING_SLOT_CLIP_DIST1
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
While I haven't tested them all, given that they're all using the same
allocation paths and modifiers in the kernel they should be fine to use in
the same way.
v2: Rebase on other kmsro changes.
v3: Skip repeated '[with_gallium_kmsro,' in the meson build.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Seems like we forget to update the index buffer (ib) status and
IndexedDrawCutIndexEnable or CutIndexEnable flag is left unchanged it
leads to ignoring of glEnable/glDisable functions for GL_PRIMITIVE_RESTART
in some cases. The index buffer (ib) status should be re-emmited after the
reset option change to avoid some unexpected behavior.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109451
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Signed-off-by: Andrii Simiklit <asimiklit.work@gmail.com>
Commit 08bfd710a2. (nir/dead_cf: Stop
relying on liveness analysis) introduced a new check that iterated
through a SSA def's uses, to see if it's used. But it only checked
normal uses, and not uses which are part of an 'if' condition. This
led to it thinking more nodes were dead than possible.
Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test
(and related tests) with the out-of-tree Iris driver.
Fixes: 08bfd710a2 nir/dead_cf: Stop relying on liveness analysis
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This gets stencil and depth resolves working properly.
Fixes:
dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.2_samples.depth24_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth24_stencil8
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_color
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_color
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Blitter does support it after all. Previous attempt to use R8_UINT
failed because we overwrote the a6xx format in emit_blit_texture(),
but some of the later setup still looked at the gallium format.
If we overwrite it in the pipe_blit_info before we even call into
emit_blit_texture() it works properly.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
v2: blob manages error state internally, just return
true if errors did not occur (Jason)
CID: 1442546
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fullscreening and unfullscreening a totem window while playing a video
sometimes results in the video subsurface not changing size along. This
is also reproducible with epiphany.
If a surface gets resized while we have an active back buffer for it, the
resized dimensions won't get neither immediately applied on the resize
callback, nor correctly synchronized on update_buffers(), as the
(now stale) surface size and currently attached buffer size still do match.
There's actually 2 things to synchronize here, first the surface query
size might not be updated yet to the wl_egl_window's (i.e. resize_callback
happened while there is a back buffer), and second the wayland buffers
would need dropping if new surface size differs with the currently attached
buffer. These are done in separate steps now.
https://bugzilla.redhat.com/show_bug.cgi?id=1650929https://bugs.freedesktop.org/show_bug.cgi?id=109594
Fixes: a9fb331ea7 ("wayland/egl: update surface size on window resize")
Signed-off-by: Carlos Garnacho <carlosg@gnome.org>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Bastien Nocera <hadess@hadess.net>
Tested-by: Denys Kostin <denys.kostin@globallogic.com>
The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.
Fixes CTS's CL#3500 test:
dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This uses prog_to_nir to translate ARB assembly programs to NIR.
Co-authored by Tim Arceri, Dave Airlie, and Ken Graunke:
- [Tim Arceri]: original patch
- [Dave Airlie]: fix crashes with parameter names
- [Ken Graunke]:
- Rebase on SCALAR_ISA cap, lower wpos_ytransform too.
- Rebase on streamout fixes.
- Lower system values for fragcoord support.
- Don't try to use prog_to_nir for ATI_fragment_shader programs.
- Create TGSI for fixed-function or ARB vertex shaders even if the
driver prefers NIR, so we can create draw module shaders for
feedback/select emulation, which rely on TGSI.
Tested on:
- iris (Intel Skylake/Kabylake): Piglit & GL CTS - Ken Graunke
- radeonsi (AMD Vega 64): Piglit - Ken Graunke
- vc4/v3d - Piglit - Eric Anholt
- freedreno - dEQP - Kristian Høgsberg
Fixes lit_degenerate_case on vc4 and v3d, and vp-address-01,
vp-arl-constant-array-huge-offset-neg, and vp-arl-neg-array on v3d.
No Piglit regressions on radeonsi; no dEQP regressions on freedreno.
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Even if the driver wants to use NIR shaders, we may need to have TGSI
tokens for creating draw module vertex shaders for the feedback/select
render modes.
So...if the st_vertex_program has any TGSI...copy it to the variant.
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL
leaves it undefined). Performing fpow lowering in NIR would break this
behavior, preventing us from using prog_to_nir.
According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common
expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>,
which presumably does a zero-wins multiply.
Lowering in NIR results in a non-legacy multiply, where:
pow(0, 0) = 2^(log2(0) * 0)
= 2^(-INF * 0)
= 2^(-NaN)
= -NaN
which isn't the desired result.
This reverts:
- commit d6b7539206
(ac/nir: remove emission of nir_op_fpow)
- commit 22430224fe
(radeonsi/nir: enable lowering of fpow)
and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir
after enabling prog_to_nir in st/mesa later in this series.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The nine state tracker can produce NIR uniform variables
whose location is explicitly set. radeonsi did not take that
into account when calculating const_file_max, resulting in
rendering glitches. This patch fixes that.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is the sddm login screen.
Fixes: a9c36dbf9c ("drirc: Initial blacklist for adaptive sync")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>