The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.
Fixes CTS's CL#3500 test:
dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This uses prog_to_nir to translate ARB assembly programs to NIR.
Co-authored by Tim Arceri, Dave Airlie, and Ken Graunke:
- [Tim Arceri]: original patch
- [Dave Airlie]: fix crashes with parameter names
- [Ken Graunke]:
- Rebase on SCALAR_ISA cap, lower wpos_ytransform too.
- Rebase on streamout fixes.
- Lower system values for fragcoord support.
- Don't try to use prog_to_nir for ATI_fragment_shader programs.
- Create TGSI for fixed-function or ARB vertex shaders even if the
driver prefers NIR, so we can create draw module shaders for
feedback/select emulation, which rely on TGSI.
Tested on:
- iris (Intel Skylake/Kabylake): Piglit & GL CTS - Ken Graunke
- radeonsi (AMD Vega 64): Piglit - Ken Graunke
- vc4/v3d - Piglit - Eric Anholt
- freedreno - dEQP - Kristian Høgsberg
Fixes lit_degenerate_case on vc4 and v3d, and vp-address-01,
vp-arl-constant-array-huge-offset-neg, and vp-arl-neg-array on v3d.
No Piglit regressions on radeonsi; no dEQP regressions on freedreno.
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Even if the driver wants to use NIR shaders, we may need to have TGSI
tokens for creating draw module vertex shaders for the feedback/select
render modes.
So...if the st_vertex_program has any TGSI...copy it to the variant.
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL
leaves it undefined). Performing fpow lowering in NIR would break this
behavior, preventing us from using prog_to_nir.
According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common
expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>,
which presumably does a zero-wins multiply.
Lowering in NIR results in a non-legacy multiply, where:
pow(0, 0) = 2^(log2(0) * 0)
= 2^(-INF * 0)
= 2^(-NaN)
= -NaN
which isn't the desired result.
This reverts:
- commit d6b7539206
(ac/nir: remove emission of nir_op_fpow)
- commit 22430224fe
(radeonsi/nir: enable lowering of fpow)
and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir
after enabling prog_to_nir in st/mesa later in this series.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The nine state tracker can produce NIR uniform variables
whose location is explicitly set. radeonsi did not take that
into account when calculating const_file_max, resulting in
rendering glitches. This patch fixes that.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is the sddm login screen.
Fixes: a9c36dbf9c ("drirc: Initial blacklist for adaptive sync")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
In the new Intel Iris driver, I am using Tim's new packed uniform
storage system. It works great, with one caveat: our scalar compiler
backend assumes that uniform offsets will be aligned to the underlying
data type. For example, doubles must be 64-bit aligned, floats 32-bit,
half-floats 16-bit, and so on. It does not need any other padding.
Currently, _mesa_add_parameter aligns everything to 32-bit offsets,
creating doubles that have an unaligned offset. This patch alters
that code to align doubles to 64-bit offsets.
This may be slightly less optimal for drivers which can support full
packing, and allow reads from unaligned offsets at no penalty. We could
make this extra alignment optional. However, it only comes into play
when intermixing double and single precision uniforms. Doubles are
already not too common, and intermixed values (floats then doubles)
is probably even less common. At most, we burn a single 32-bit slot
to the alignment, which is not that expensive. So, it doesn't seem
worthwhile to add the extra complexity.
Eventually, we'll likely want to update this code to allow half-float
values to be packed tighter than 32-bit offsets. At that point, we'll
probably want to revisit what drivers ultimately want, and add options.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
I'd like to use this in the prog_parameter.c code, so I need to move it
into C, make it non-static, and so on. This probably isn't the ideal
place for it, but I couldn't think of a better one.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Last commit limited the CI to master and MRs, but to avoid having to
manually trigger CI runs, let's add a 3rd, automatic way: by pushing to
a branch named `ci/*` (or `ci-*` or just `ci`) (which you can delete
afterwards, the pipeline results will remain).
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Runs on random other branches (stables RCs, personal forks) can still be
triggered manually via the web interface, or an app using the API.
This should massively help with the current voracious state of our CI.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
v2: don't use ac_get_zero(), ac_get_one() and ac_int_of_size()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
v2: don't use ac_get_zerof() and ac_get_onef()
v3: rename "intr" to "name"
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
So that the signature is correct and consistent, the inputs to a export
intrinsic should always be 32-bit floats.
This and the previous commit fixes a large amount crashes from
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_int_*
tests
Fixes: b722b29f10 ('radv: add support for 16bit input/output')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
16-bit outputs are stored as 16-bit floats in the outputs array, so they
have to be bitcast.
Fixes: b722b29f10 ('radv: add support for 16bit input/output')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need
to do VPM setup when the load_inputs are out of order. For V3D 4.x, we
can reduce register pressure by delaying our loads until they're actually
needed. This also avoids a bunch of silly MOVs in the pre-opt VIR dump.
total instructions in shared programs: 6421415 -> 6419933 (-0.02%)
total uniforms in shared programs: 2393139 -> 2393140 (<.01%)
total threads in shared programs: 153864 -> 153906 (0.03%)
The execute.file check used to be good enough, until I stopped setting up
the execute mask for uniform ifs.
No known tests fixed, noticed while doing a refactor.
Fixes: 0805060573 ("v3d: Handle dynamically uniform IF statements with uniform control flow.")
Now that we don't have the vir_PF() magic, it's obvious that we were doing
the wrong thing for f2b32 by allowing -0.0 to produce true instead of
false.
You were allowed to pass in any old temp so that you could hopefully fold
the PF up into the def of the temp. If we couldn't find one, it
implicitly generated a MOV(nop, reg). However, that PF could have
different behavior depending on whether the def being folded into was a
float or int opcode, which the caller doesn't necessarily control.
Due to the fragility of the function, just switch all callers over to
vir_set_pf(). This also encourages the callers to use a _dest call for
the inst they're putting the PF on, eliminating a bunch of temps in the
pre-optimization VIR.
shader-db says the change is in the noise:
total instructions in shared programs: 6226247 -> 6227184 (0.02%)
instructions in affected programs: 851068 -> 852005 (0.11%)
Both were doing the same thing to try to get a condition to predicate on.
Noticed when I wanted to do this for discard_if as well.
No change in shader-db.
The NIR lowering works fine, though it causes some slight noise due to
what looks like choices about propagating constants up multiply chains
changing.
total instructions in shared programs: 6229671 -> 6229820 (<.01%)
total uniforms in shared programs: 2312171 -> 2312324 (<.01%)
Fixes some stalls in 3DMMES's main vertex shader.
total instructions in shared programs: 6280751 -> 6211270 (-1.11%)
instructions in affected programs: 2935050 -> 2865569 (-2.37%)
Apparently we need disable-EZ flagged, not just "does Z writes".
Fixes
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
on 7278, even though it passed in simulation.
Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: 051a41d3d5 ("v3d: Add support for the early_fragment_tests flag.")
Fixes intermittent fails in
dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices
and others (particularly when run as part of a CTS run)
Otherwise, we might have pages accessible that shouldn't be and miss out
on errors. This is unlikely for most tests since v3d_hw_get_mem() is big
enough that it'll be a freshly zeroed mmap, but if screens are destroyed
and recreated then we'd be reusing the old v3d_hw_get_mem() contents.