turnip can have multiple inputs with the same location, and different
location_frac.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3109>
Specifically, execution size, register file, and register type. I did
not add validation for vertical stride and width because I don't believe
it's possible to have an otherwise valid instruction with an invalid
vertical stride or width, due to all of the other regioning
restrictions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
In order to fuzz test instructions, we first need to do some sanity
checking first. Factoring out this function allows us an easy way to
validate a single instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
16-bit immediates need to be replicated through the 32-bit immediate
field, so we should never see one that isn't.
This does happen however in the fuzzer unit test, so returning false
allows the fuzzer to reject this case.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Previously we were sharing tables between generations that were nearly
identical (i.e., Gen8 3-src adds HF support) and used a small bit of
code to handle the differences. This is kind of a mess if you want to
reject 64-bit types on platforms that don't support 64-bit types, so
split the tables, allowing each generation's table to list exactly what
it supports.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Since the enum brw_reg_type is packed, comparisons with -1 don't work
directly, necessitating the cast. Add a macro to avoid this confusion.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Two of the tests emit instructions with MRF destinations, and MRFs
aren't present on Gen7+. I think we were just lucky that this didn't
cause a problem earlier since we were running the tests on Gen7-9.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Since the platforms don't support align1 3-src instructions, the
contents of these operands are not going to be meaningful. Just don't
print them to avoid hitting some assertions in brw_inst functions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
When there's only one hardware thread (i.e. the dispatch width greater
or equal to the workgroup size), there's no need to use a barrier to
ensure all the invocations reach the same point in the shader, because
they are already running lock-step.
Results for SKL running Iris for shader-db tests with compute shaders
total sends in shared programs: 18361 -> 18339 (-0.12%)
sends in affected programs: 904 -> 882 (-2.43%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 2.44 x̃: 2
helped stats (rel) min: 0.84% max: 21.43% x̄: 7.82% x̃: 2.67%
95% mean confidence interval for sends value: -3.31 -1.58
95% mean confidence interval for sends %-change: -14.67% -0.97%
Sends are helped.
Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped.
Results for ICL and TGL are similar to SKL.
Results for BDW are similar to SKL except for DeusEx shader that has a
workgroup size 16 but in BDW picks the SIMD8.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
When there's only one hardware thread (i.e. the dispatch width greater
or equal to the workgroup size), there's no need to synchronize shared
memory access (SLM) since all the requests from a single thread are
already synchronized. In such case, we just add a scheduling fence.
To be able to identify that case for all platforms, move the handling
of platforms prior to Gen11 (which don't have a separate SLM fence)
after the optimization.
Results for SKL running Iris for shader-db tests with compute shaders
total sends in shared programs: 18395 -> 18361 (-0.18%)
sends in affected programs: 938 -> 904 (-3.62%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 3.78 x̃: 4
helped stats (rel) min: 1.56% max: 26.32% x̄: 10.33% x̃: 2.60%
95% mean confidence interval for sends value: -4.85 -2.71
95% mean confidence interval for sends %-change: -19.12% -1.54%
Sends are helped.
Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped.
Results for ICL and TGL are similar to SKL.
Results for BDW are similar to SKL except for DeusEx shader that has a
workgroup size 16 but in BDW picks the SIMD8.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Like a SHADER_OPCODE_MEMORY_FENCE but doesn't doesn't generate any
assembly code.
Will be used when the compiler shouldn't reorder certain instructions
but there's no need to generate code for the HW to do it -- as the
ordering will be guaranteed by other means.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Current code only checks whether first plane's format is supported
in case YUV format sampling is done by sampling each plane separately.
It would be safer to check other planes' as well.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2863>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2863>
The closed GL driver doesn't use UBWC on any storage images. It does tile
mostly (skipping tiling on writeonly images, it seems), but for freedreno
we've been enabling tiling in all cases and it's fine. We do need to
disable UBWC, as tests fail otherwise and just plugging in the equivalent
UBWC regs like we were setting up a texture isn't enough.
Fixes dEQP-VK.image.atomic_operations.*
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
So far this doesn't handle the texture state-based storage image access
loads, and doesn't support descriptor arrays (same as SSBOs). The texture
side is more tricky, since we have another remapping table to work around.
This is enough to get some of dEQP-VK.image.atomic_operations.* working.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
The exception is the texel/gather offsets and stream output
components, which will not be exposed since we don't expose the
corresponding GLSL version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3493>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3493>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3493>
Just make it be all SSBOs then all storage images. The remapping table
was there to make it so that the big gap present from gallium's atomic
lowering would get cleaned up, but that's no longer case. The table has
made it very hard to support Vulkan storage images, so it's time for it to
go.
This does mean that an SSBO/IBO that is only loaded (or size-queried) will
now occupy a slot in the table where it wouldn't before. This seems like
a minor cost compared to being able to drop this much logic.
With the remapping table gone, SSBO array handling for turnip just falls
out.
Fixes many array cases of
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_buffer.*
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca> (turnip)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
These numbers are always confusing, and it's particularly so for this
field where it has a different meaning in different info structs.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
The arguments passed in were:
- prog->info.num_ssbos
- prog->nir->info.num_ssbos
- arbitrary values for standalone compilers
The num_ssbos should match between the prog's info and prog->nir's info
until this lowering happens.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
We carve out half the SSBO space for atomics, and we were just binding
them way up there. freedreno was then using a remapping table to map the
sparse buffer index back down, since space in the descriptor array is a
shared resource that may limit parallelism. That remapping table
generated inside of the ir3 compiler is getting thoroughly in the way of
implementing vulkan descriptor sets.
We will be able to get rid of the freedreno's remapping table, and
hopefully save shared resources on other hardware, by packing the atomics
tightly above the SSBOs (like i965 does). We already rebind the shader
buffers on program change if either the old or new program has SSBOs or
ABOs, so this doesn't necessarily increase the program state change cost
(the only cost increase I can come up with is if you're using the same
atomic counter without rebinding it across changes of programs with
varying SSBO counts, meaning it would now bounce around index space).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
Gallium arbitrarily (it seems) put atomics below SSBOs, resulting in a
bunch of extra index management, and surprising shader code when you would
see your SSBOs up at index 16. It makes a lot more sense to see atomics
converted to SSBOs appear as magic high numbers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>