For each mipmaps, the driver will store the clear values (8-bytes)
and the TC-compat zrange value (4-bytes).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The workaround was entirely in common code, and it's needed in radeonsi
too so just always do it when necessary. Fixes
KHR-GL45.shader_image_load_store.advanced-allStages-oneImage on gfx9
with LLVM 8.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
GL and Vulkan allow you to bind a single layer of a 3D texture to a 2D
image, and we weren't implementing a workaround for that on gfx9 that
TGSI was. Copy it over.
Fixes KHR-GL45.shader_image_load_store.non-layered_binding with radeonsi
NIR.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
16-bit and 32-bit values match hardware values but 8-bit doesn't.
This fixes dEQP-VK.pipeline.input_assembly.* with 8-bit index.
Fixes: 372c3dcfdb ("radv: implement VK_EXT_index_type_uint8")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
The virgl formats are fixed in time snapshots of the gallium ones,
we just need to provide a translation table between them when
we enter the hardware.
This fixes a regression since Eric renumbered the gallium table.
Fixes: c45c33a5a2 (gallium: Remove manual defining of PIPE_FORMAT enum values.)
Bugzilla: https://bugs.freedesktop.org/111454
v1 by Dave Airlie <airlied@redhat.com>
v2: virgl: Add a number of formats to the table that are used, e.g. for vertex
attributes
v3: cover some more missing formats from a piglit run
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
pp has vector units and some operations can be optimized when bundled
together.
Benchmarking this with piglit shaders shows that the instruction count
can be greatly reduced on many examples with vectorize.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
nir vec4 fcsel assumes that each component of the condition will be used
to select the same component from the options, but pp can't implement
that since it only has 1 component for the condition.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
The previous spill stack was fixed and too small, and caused instability
in programs requiring spilling for roughly more than one value.
This patch adds a dynamic calculation of the buffer size based on stack
utilization and switches it to a separate allocation at flush time that
will fit the shader that requires the largest buffer.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
It's not used by anything anymore now that so much lowering has been
moved into NIR. Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge. Short of a lot of pointless work,
that one's probably not going away.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create a unified table to handle pipe format to texture
and render target format lookup.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.
When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.
Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.
On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(Bas: CCing this to 19.2-rc due to high impact and limited complexity)
It can be useful for debugging purposes
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This commit adds support for nir_jump_instr, if and loop
nir_cf_nodes.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Add better liveness analysis that was modelled after one in vc4.
It uses live ranges and is aware of multiple blocks which is prerequisite
for adding CF support
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Mali4x0 supports only gl_FragColor. gl_FragDepth is not supported.
Check that we don't get anything but gl_FragColor in shader outputs.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
We don't have a special OP to store color in PP, all we need to do is to
store gl_FragColor into reg0, thus it's just a mov and therefore ALU node.
Yet we still need to indicate that it's store_color op so regalloc ignores
its destination.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Create ppir block for each corresponding NIR block and populate
its successors. It will be used later in liveness analysis and
in CF support
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
We can get following from NIR:
(1) r1 = r2
(2) r2 = ssa1
Note that r2 is read before it's assigned, so there's no node for
it in comp->var_nodes. We need to create a dummy node in this case
which sole purpose is to hold ppir_dest with reg in it.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
There can be several root nodes, i.e.:
(1) r0 = r1
(2) r2 = r3
(3) branch if (ssa1)
We need to make (3) depend on (1) and (2), old code added
dependency only for (2), and (1) was kept as root node since there
is no branch/discard or store color between two movs.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Destination for texture load can be a reg, so we need to
set write mask in this case
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
ppir_lower_load() and ppir_lower_load_texture() assume that node
is in the same block as its successors, fix it by cloning each
ld_uni and ld_tex to every block.
It also reduces register pressure since values never cross block
boundaries and thus never appear in live_in or live_out of any block,
so do it for varyings as well.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Const nodes are now cloned for each user, i.e. const is guaranteed to have
exactly one successor, so we can use ppir_do_one_node_to_instr() and
drop insert_to_each_succ_instr()
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
On commit f6e7de41d7, we started emitting 3DSTATE_LINE_STIPPLE as part
of the non-dynamic state. That gets re-emitted every time we bind a new
VkPipeline. But that instruction is non-pipelined, and it caused a perf
regression of about 9-10% on Dota2.
This commit makes anv_dynamic_state_copy() return a mask with only the
state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be
emitted anymore unless it has changed, fixing the problem above.
v2: Improve commit message and add documentation about skipped checks
(Jason)
Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We can statically determine from the disassembly if helper invocations
will be needed, so we can validate the corresponding bit in the
cmdstream and thus avoid printing the bit itself in the decode.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We check for texture ops which calculate derivatives (either explicitly
via dFd* or implicitly) and mark the shader as requiring helper
invocations.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
timespec_get() is not available on macos, we need to pull in the
include/c11/threads_posix.h helper.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103674
Fixes: e2d761de03 ("util: drop final reference to p_compiler.h")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The variables level and start_layer are not initialized, then
initialized if we have a BUFFER_BIT_DEPTH set. We assert on them
later using the same check. This should be enough but GCC 9.1.1 is
not convinced, so let's initialize the variables.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Code that used it was removed in 4ebe6b2e72 ("tgsi: Drop the SSE2
constants setup that's been dead code since 2011.")
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Initialize `next_batch_addr` and `second_level`. If the batch is well
formed, those values will be overriden, if not, they are as good as
uninitialized garbage.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The helper check_node_type() is only used when DEBUG is set (in the
function below), but ASSERTED macro uses NDEBUG. So just guard the
helper with #ifdef. If we see more such cases we might consider a
ASSERTED-like macro for the DEBUG case.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Leftover from 021fa28163 ("xintel/nir: Add a helper for getting
BRW_AOP from an intrinsic").
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compiler can't see that d is initialized.
../src/intel/compiler/brw_vec4_nir.cpp: In function ‘int brw::try_immediate_source(const nir_alu_instr*, brw::src_reg*, bool, const gen_device_info*)’:
../src/intel/compiler/brw_vec4_nir.cpp:984:12: warning: ‘d’ may be used uninitialized in this function [-Wmaybe-uninitialized]
984 | d = MAX2(-d, d);
Assert that we expect at least one component -- hence d going to be
set. That by itself is not enough, so also zero initialize the
variable.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Make sure we read the updated data from the gpu in cases where WAIT_BIT
is not set.
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
...by copying the implementation of anv_get_absolute_timeout().
Appears to fix a CTS test with 32-bit builds:
GTF-GL46.gtf32.GL3Tests.sync.sync_functionality_clientwaitsync_flush
Fixes: f459c56be6 ("iris: Add fence support using drm_syncobj")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Rafael Antognolli tracked down a performance gap between i965 and iris
in Synmark2's OglCSDof microbenchmark, noting that iris was performing
substantially more memory reads and writes, with substantially fewer
L3 hits. He suggested that something might be wrong with MOCS, or L3
configs, at which point I came up with a theory...
It would appear that the STATE_BASE_ADDRESS command updates the MOCS
settings for various base addresses even if you don't specify the
"Modify Enable" bit for that address. Until now, we had been setting
only the MOCS for bases we intended to change, leaving the others
"blank" which is MOCS table entry 0, which is uncached.
Most data access has a more specific MOCS (e.g. in SURFACE_STATE),
but scratch access uses the Stateless Data Port Access MOCS from
STATE_BASE_ADDRESS. So this meant all scratch access was uncached.
Improves performance in Synmark2's OglCSDof by 2x, bringing iris
on par with the existing i965 driver.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fix this build error on macOS.
../src/glx/apple/glx_empty.c:158:4: error: void function 'glXQueryGLXPbufferSGIX' should not return a value [-Wreturn-type]
return 0;
^ ~
Fixes: 3dd299c3d5 ("glx: Sync <GL/glxext.h> with Khronos")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Similarly to before, this didn't properly handle varying structs with
doubles in them.
This doesn't fix any tests, but was noticed while looking at the code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The old version wasn't as accurate as it could be, and didn't handle
double variables inside structs correctly. Walk the path to compute the
actual components affected.
In combination with the previous commit fixes
KHR-GL45.enhanced_layouts.varying_structure_locations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is already done in get_deref_offset() in the common code. We were
adding it twice accidentally.
Fixes KHR-GL45.enhanced_layouts.varying_array_locations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>