Fixes: 8ecdbb6136 "i965: Pretend there are 4 subslices for compute shader threads on Gen9+."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104005
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
This patch is purely for readability improvements when programming
the MEDIA_VFE_STATE.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We use _mesa_warning() everywhere else in this code. Change requested
by Rick Irons of Mathworks.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No need to pass a pipe_resource when we can just pass the target.
This makes the function potentially more usable. Rename it too.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This function is only used in two places:
1. VMware driver, but only for HUD reporting
2. st/nine state tracker, used for texture memory accounting
Fixes: a69efa9482 ("util: add new util_resource_size() function in
u_resource.[ch]")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Pointers with no storage type are converted to inout variables but SSA
values and pointers with a storage type (which turns into a uint or
uvec2) are just input variables.
Previously, we just gave them exactly the same type as the respective
image (which already had a sampler2D or similar type). Now they have
their own base type and a pointer to the vtn_type for the image.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Thanks to Emil's -Wundef, t_dd_dmatmp.h now complains that intel_render.c
is missing a couple `#define`s.
Assigning them to 0 keeps the existing behaviour; I'll let someone else
turn them on if this is the behaviour that was intended.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note that gl_shader::CompileStatus will also indicate whether a shader
has been successfully specialized.
v2: Use the 'spirv_data' member of gl_shader to know if it is a SPIR-V
shader, instead of a dedicated flag. (Timothy Arceri)
v3: Use bool instead of GLboolean. (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: * Add a gl_shader_spirv_data member to gl_shader, which already
encapsulates a gl_spirv_module where the binary will be saved.
(Eduardo Lima)
* Just use the 'spirv_data' member to know whether a gl_shader has
the SPIR_V_BINARY_ARB state. (Timothy Arceri)
* Remove redundant argument checks. Move extension presence check
to API entry point where the rest of checks are. Retype 'n' and
'length'arguments to use the correct and more standard types.
(Ian Romanick)
* Fix some nitpicks. (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a per-shader structure holding the SPIR-V data associated with the
shader (binary module, specialization constants and entry-point).
This is needed because both gl_shader and gl_linked_shader need to share this
data. Instead of copying the data, we pass a reference to it upon program
linking. That's why it is reference-counted.
This struct is created and associated with the shader upon calling
glShaderBinary(), then subsequently filled up by the call to
glSpecializeShaderARB().
v2: Readability improvements (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: * Make the SPIR-V module struct part of a larger gl_shader_spirv_data
struct that will be introduced later, and don't reference it directly
in gl_shader. (Eduardo Lima)
* Readability improvements (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: * Add meson build bits (Eric Engestrom)
* Return INVALID_OPERATION error on SpecializeShaderARB (Ian Romanick)
v3: Include boilerplate for the GL 4.6 alias of glSpecializeShaderARB
(Neil Roberts)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of calling vtn_add_case for the default case and then looping,
add an is_default variable and do everything inside the loop. This will
make the next commit easier.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This autogenerated pass will automatically find and set the type field
on all vtn_values. This way we always have the type and can use it for
validation and other checks.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
At the moment, this just lets us drop the const_type for constants and
unify things a bit. Eventually, we will use this to store the types of
all SPIR-V SSA values.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can write to the same output but in different components, like
in this example:
layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;
Therefore, they are not two different outputs but only one.
Fixes:
dEQP-VK.glsl.440.linkage.varying.component.frag_out.*
v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
outputs when having arrays (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
GLSL IR operation arguments can sometimes have an implicit swizzle as a
result of a vector arg and a scalar arg, where the scalar argument is
implicitly expanded to the size of the vector argument.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103955
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Care must be taken that all coords end up correct, the tests are very
sensitive that everything is correctly rounded. This doesn't matter
for bilinear filter (since picking a wrong texel with weight zero is
ok), and we could also switch the per-sample coords mistakenly.
While here, also optimize the coord_mirror helper a bit (we can do the
mirroring directly by exploiting float rounding, no need for fixing up
odd/even manually).
I did not touch the mirror_clamp and mirror_clamp_to_border modes.
In contrast to mirror_clamp_to_edge and mirror_repeat these are legacy
modes. They are specified against old gl rules, which actually does
the mirroring not per sample (so you get swapped order if the coord
is in the mirrored section). I think the idea though is that they should
follow the respecified mirror_clamp_to_edge rules so the order would be
correct.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Since we switched over to lowering SLM access directly in SPIR-V -> NIR,
we no longer have vtn_variables for SLM. It's all safe as with UBOs and
SSBOs but we need to let it through in the assert.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104213
Fixes: 8761a04d0d
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
To support the reindex intrinsic, we need the result to be
something on which we can adjust the index/address.
Since it is all within a basic block, the compiler should be
able to merge any extra loads.
v2: Change visit_get_buffer_size too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
If the app does not plan to put a buffer or image in it
(why? But it is allowed and CTS does it), they do not need to
allocate it with the deciate allocation struct.
Fixes: a639d40f13 "radv: add support for local bos. (v3)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
There is no chain, so checking the length ends with a SEGFAULT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103579
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Push constants on Intel hardware are significantly more performant than
pull constants. Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be. By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.
On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In Vulkan, we don't support classic pull constants and everything the
client asks us to push, we push. However, for pushed UBOs, we still
want to fall back to conventional pulls if we run out of space.
Push constants work in terms of 32-byte chunks so if we want to be able
to push UBOs, every thing needs to be 32-byte aligned. Currently, we
only require 16-byte which is too small.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In order to do this we have to modify push constant set up to handle
ranges. We also have to tweak the way we handle dirty bits a bit so
that we re-push whenever a descriptor set changes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There are several places where we look up opcodes in an array of stages.
Assert that the we don't end up going out-of-bounds.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We want to call brw_nir_analyze_ubo_ranges immedately after
anv_nir_apply_pipeline_layout and it badly wants constants. We could
run an optimization step and let constant folding do it but that's way
more expensive than needed. It's really easy to just handle constants
in apply_pipeline_layout.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This rewires the logic for assigning uniform locations to work in terms
of "complex alignments". The basic idea is that, as we walk the list of
instructions, we keep track of the alignment and continuity requirements
of each slot and assert that the alignments all match up. We then use
those alignments in the compaction stage to ensure that everything gets
placed at a properly aligned register. The old mechanism handled
alignments by special-casing each of the bit sizes and placing 64-bit
values first followed by 32-bit values.
The old scheme had the advantage of never leaving a hole since all the
64-bit values could be tightly packed and so could the 32-bit values.
However, the new scheme has no type size special cases so it handles not
only 32 and 64-bit types but should gracefully extend to 16 and 8-bit
types as the need arises.
Tested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The testing for this extension is currently very poor. The CTS tests
only test accessing UBOs and SSBOs at dynamic offsets so none of our
constant-offset paths get triggered at all. Also, there's an assertion
in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which
is never triggered indicating that nothing every gets loaded from an
offset which is not a dword. Both push constants and the constant
offset pull paths are complex enough, we really don't want to ship
without tests. We'll turn the extension back on once we have decent
tests.
VCE processing IBs starts from session and task info at first level,
other commands processed subsequently. The task info for destroy is
embedded to destroy command, resulting that feedback command is not
properly procoessed. This is causing kernel spin VM fault messages on
Polaris and Vega10 card when running ends at encode application.
The fix is also verified on VCE physical mode card.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Christian König <christian.koenig@amd.com>
Power8, Power8NV, and Power9 are supported on an equal footing
with X86.
Cc: "17.2" "17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
[Eric: changed formatting, reworded a bit (with Ben's ack)]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>