This reduces the number of SET_SH_REG packets which are emitted
for applications that use more than one descriptor set per stage.
We should be able to emit more SET_SH_REG packets consecutively
(like push constants and vertex buffers for the vertex stage),
but this will be improved later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will allow to emit consecutive shader pointers for
reducing the number of emitted SET_SH_REG packets, which
is recommended.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Previously, findFirstUse() only considered reads "uses". This fixes that
by making it check both an instruction's sources and definitions. It
also shortens both findFistUse() and findFirstDef() along the way.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Literally the same as the AMD ext.
Passes *indirect_draw_count* CTS tests.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This improves dota2 performance for me by 11% when I force the
GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my
threadripper), which should be a large part of the radv-amdvlk gap.
(For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68
fps)
It looks like dota2 rendered the GUI with a bunch of draws with
a SetScissors before almost each draw, causing a lot of pipeline
stalls.
I'm not really happy with the duplication of code, but overriding
radeon_set_context_reg would also be messy since we have the
pre-recorded pipelines and a bunch of si_cmd_buffer code, as well
as some memory->context reg loads for which things would be more
complicated.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This reverts commit 5c33e8c772. It broke
fixed function vertex programs on vc4 and v3d, and apparently caused
trouble for radeonsi's NIR paths as well.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
https://bugs.freedesktop.org/show_bug.cgi?id=106673
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.
v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Make sure only those components are written to that are specified in the
write mask.
Fixes:
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_vec3_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_vec3_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_vec3_fragment
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In cases like
IDIV TEMP[0].xy TEMP[0].xx TEMP[1].yy
the result will be written to the same register that is also a source register.
Since the components are evaluated one by one, this may result in overwriting
the source value for a later operation. Work around this by adding another
temporary to store the result if the destination temporary index is equal to
one of the source temporary indices.
Fixes:
dEQP-GLES2.functional.shaders.operator.binary_operator.div.*
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 79fe00efb4.
This reverts commit f5e8b13f78.
This reverts commit d21c086d81.
They broke the Android build and I'd rather not leave it broken
for the long holiday weekend.
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.
v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)
v3: Add units to parameter names of tile_extents (Nanley Chery)
Use _mesa_align_malloc for the shadow copy (Nanley)
Continue using gtt maps on gen4 (Nanley)
v4: Use streaming_load_memcpy when detiling
v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
takes precedence. Add intel_miptree_access_raw, needed after
rebasing on commit b499b85b0f.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We stream from a tiled and aligned source into an unaligned user buffer,
so we need to use _mm_storeu_si128.
Fixes: d21c086d81 (i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For certain EGLImage cases, we represent a single slice or LOD of an
image with a byte offset to a tile and X/Y intratile offsets to the
given slice. Most of i965 is fine with this but it breaks blorp. This
is a terrible way to represent slices of a surface in EGL and we should
stop some day but that's a very scary and thorny path. This gets blorp
to start working with those surfaces and fixes some dEQP EGL test bugs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106629
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GS is tested, tessellation is untested.
Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes shader images where we always bind stObj->pt and not individual
gl_texture_images.
Roughly based on i965 commit 845ad2667a
which does a similar thing but for a different reason.
This fixes GL CTS assertion failures introduced by Ilia.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Optimize AVX-512 PA Assemble (PA_STATE_OPT). Reduced generated code by
about 4x, MSVC compiler was going crazy making temporaries and
split-loading inputs onto the stack unless explicit AVX-512 load ops
were added
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Added two new files for a wrapper function for initialization
v2: added missing include for single architecture builds
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
It's recommended by the instruction combining pass, and
RadeonSI also runs it.
This pass used to segfault with one shader of F12017 in the
past, but it no longer crashes. Maybe the LLVM IR generated
by RADV has changed.
Polaris10:
Totals from affected shaders:
SGPRS: 441352 -> 441648 (0.07 %)
VGPRS: 310888 -> 300784 (-3.25 %)
Spilled SGPRs: 13576 -> 12983 (-4.37 %)
Code Size: 22560328 -> 22420544 (-0.62 %) bytes
Max Waves: 40755 -> 41366 (1.50 %)
Vega10:
Totals from affected shaders:
SGPRS: 442848 -> 442000 (-0.19 %)
VGPRS: 310396 -> 300460 (-3.20 %)
Spilled SGPRs: 13708 -> 12906 (-5.85 %)
Code Size: 22479428 -> 22336216 (-0.64 %) bytes
Max Waves: 45783 -> 46506 (1.58 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use a flag for the active stages instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When glUseProgram is used, references to the included shaders are
added in ctx->Shader.ReferencedProgram. But those references are not
decreased when the shader data is deallocated. Thus, those shaders
are leaked.
Explicitely remove the pending references to these shaders.
Fixes: e6506b3cd2 ("mesa: retain gl_shader_programs after glDeleteProgram if they are in use")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Functionality already covered by ARB_texture_view, patch also
adds missing 'gles guard' for enums (added in f1563e6392).
Tested via arb_texture_view.*_gles3 tests and individual app
utilizing texture view with ETC2.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This doesn't nothing special currently because we don't create
any copy_var instructions, but this is needed for the next patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of directly using intel_obj->buffer. Among other things
intel_bufferobj_buffer() will update intel_buffer_object::
gpu_active_start/end, which are used by glBufferSubData() to decide
which path to take. Fixes a failure in the Piglit
ARB_shader_image_load_store-host-mem-barrier Buffer Update/WaW tests,
which could be reproduced with a non-standard glGetTexSubImage
implementation (see bug report).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105351
Reported-by: Nanley Chery <nanleychery@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Otherwise the specified surface state will allow the GPU to access
memory up to BufferOffset bytes past the end of the buffer. Found by
inspection.
v2: Protect against out-of-range BufferOffset (Nanley).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The buffer texture size calculations (should be easy enough, right?)
are repeated in three different places, each of them subtly broken in
a different way. E.g. the image load/store path was never fixed to
clamp to MaxTextureBufferSize, and none of them are taking into
account the buffer offset correctly. It's easier to fix it all in one
place.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106481
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>