Commit Graph

110697 Commits

Author SHA1 Message Date
Jason Ekstrand 00d4e78ea9 nir/algebraic: Optimize integer cast-of-cast
These have been popping up more and more with the OpenCL work and other
bits causing extra conversions to/from 64-bit.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-26 04:26:08 -05:00
Jason Ekstrand 934f178341 anv/descriptor_set: Don't fully destroy sets in pool destroy/reset
In 105002bd2d, we fixed a memory leak bug where we weren't properly
destroying descriptor when destroying/resetting a descriptor pool.
However, the only real leak that happened was that we we take a
reference to the descriptor set layout in the descriptor set and we
weren't dropping our reference.  Everything else in the descriptor set
is tied to the pool itself and doesn't need to be freed on a per-set
basis.  This commit changes the destroy/reset functions to only bother
walking the list of sets to unref the layouts and otherwise we just
assume that the whole-pool destroy/reset takes care of the rest.

Now that we're doing more non-trivial things with descriptor sets such
as allocating things with util_vma_heap, per-set destruction is starting
to show up on perf traces.  This takes reset back to where it's supposed
to be as a cheap whole-pool operation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-26 05:40:28 +00:00
Jason Ekstrand baf4802e3e anv: Better handle 32-byte alignment of descriptor set buffers
In c520f4dec9, we chose to align the sizes of descriptor set buffers to
32 bytes.  We have to align the descriptor set buffer to 32B so that
it's valid for using with push constants.  We align the size as well so
we don't leave lots of holes with util_vma_heap_alloc.  Unfortunately,
we were only aligning it for alloc and not for free so we were still
creating piles of holes when we delete descriptor sets.  This causes
terrible perf for the allocator once we've deleted piles of descriptor
sets.

This commit reworks the code so that we align the descriptor set buffer
size to 32B for both alloc and free.  The result is that it takes the
new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds.

Fixes: c520f4dec9 "anv: Add a concept of a descriptor buffer"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-26 05:40:28 +00:00
Dave Airlie d946cbe9f5 nir: fix bit_size in lower indirect derefs.
This fixes a case where we are expecting 64-bit but generate
32-bit consts and validate gets angry.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-26 12:59:43 +10:00
Kenneth Graunke 529ace7887 iris: Silence unused function warning 2019-04-25 17:33:56 -07:00
Marek Olšák c5f65bfe6c glsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2)
This fixes KHR-GL45.compute_shader.resources-max on radeonsi.

Fixes: 4e1e8f684b "glsl: remember which SSBOs are not read-only and pass it to gallium"

v2: use is_interface_array, protect again assertion failures in u_bit_consecutive

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-04-25 18:57:38 -04:00
Rob Clark a6ab27dcab docs/features: update GL too
Forgot to update corresponding entries for desktop GL.. kinda wish we
didn't have to update both GLES and GL tables.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 15:48:19 -07:00
Rob Clark 7a57cfbed6 freedreno/a6xx: sample-shading support
Enables:

  OES_sample_shading
  OES_sample_variables
  OES_shader_multisample_interpolation

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark ee2e3a07bb freedreno/ir3: sample-shading support
The compiler support for:

  OES_sample_shading
  OES_sample_variables
  OES_shader_multisample_interpolation

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 85949c52b4 freedreno: wire up core sample-shading support
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark c8e825aaac freedreno/ir3: fix load_interpolated_input slot
The so->inputs[] table is in units of vec4

Fixes: 7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 49f922d96c freedreno/a6xx: add VALIDREG/CONDREG helper macros
There are a few places that we check if a shader stage input reg is
used/valid (ie. not r63.x).. and there are about to be a bunch more.
So add some helper macros for less open-coding.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark f4b4d6cf23 freedreno/ir3: rename frag_vcoord -> ij_pixel
Since this is what the value actually is.  Cleanup the name before
adding more different i,j related values for sample-shading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 5be415fc2b freedreno/ir3: remove bogus assert
tex instruction can actually return 16b values.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 2f0b9d2249 freedreno/ir3: lower load_barycentric_at_offset
Calculates i,j at specified offset within a pixel.  A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark c4f423aa36 freedreno/ir3: lower load_barycentric_at_sample
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 4e3ce224a7 freedreno: update generated headers
Pull in updates for sample shading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 6d6ec2d4d2 freedreno/ir3: cleanup instruction builder macros
De-duplicate the "normal" and "flags" versions of the macros, and while
at it go ahead and add "flags" versions for all the remaining macros,
since we'll at least need INSTR1F in a following commit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 77b3b96a3b freedreno/ir3: more emit-cat5 fixes
Couple more opcodes which don't take a sampler id as first arg.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 9032f0690c freedreno/ir3: fix rgetpos decoding
It takes an argument.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 4d08c1b595 compiler: rename SYSTEM_VALUE_VARYING_COORD
And add corresponding enums for different sorts of varying
interpolation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 96d2e4ab8a freedreno: add robustness support
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 6503918689 freedreno/drm: update for robustness
Update UABI header and add FD_PP_PGTABLE and FD_NR_FAULTS params.

Robustness can be supported by a kernel which provides the new ABI if it
also indicates that per-process pagetables are in use.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:07 -07:00
Alyssa Rosenzweig 77d091d0c5 panfrost/midgard: Add new bitwise ops
These fused NOT-ops could maybe help somehow...?

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-25 20:37:46 +00:00
Alyssa Rosenzweig bcabcfe3ad panfrost/midgard: Identify inand
This was previously thought to be inot, but it's actually a bit more
general than that! :)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-25 20:37:45 +00:00
Alyssa Rosenzweig 5f942db190 panfrost/midgard: Copy prop for texture registers
We'll want to unify this with main copy prop (and extend to varyings),
but that'll take more care to handle some special cases, so leave it as
a stub pass for now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-25 20:37:45 +00:00
Alyssa Rosenzweig 4d821a1101 panfrost/midgard: Optimize csel involving 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-25 20:37:45 +00:00
Alyssa Rosenzweig b53b4573c3 panfrost/midgard: Extend copy propagation pass
This extends copy propagation to respect output modifiers for ALU
instructions, as well as potentially fixing some bugs related to looping
(all dEQP loop tests pass).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-25 20:37:45 +00:00
Alyssa Rosenzweig 7bc91b487b panfrost/midgard: Reduce fmax(a, 0.0) to fmov.pos
This will allow us to copyprop away the move and eliminate the
instruction entirely.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-04-25 20:37:45 +00:00
Bas Nieuwenhuizen 295536d47a radv: Expose Vulkan 1.1 for Android.
We have the YCBCR feature now.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 8bb3cec7c9 radv: Expose VK_EXT_ycbcr_image_arrays.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen fc9248e13e radv: Enable YCBCR conversion feature.
This enabled the basic YCBCR features.

We support basic multiplane formats using 8-bit and 16-bit unorms, as
well as YUV2 formats.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 379b82dace radv: Add ycbcr subsampled & multiplane formats to csv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 52c1adda21 radv: Add ycbcr format features.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen b769a549ee radv: Add hashing for the ycbcr samplers.
Otherwise caching gets very confused.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 5c3467e74a radv: Run the new ycbcr lowering pass.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 91702374d5 radv: Add ycbcr lowering pass.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 5564c38212 radv: Update descriptor sets for multiple planes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 7f6732ac69 radv: Add ycbcr samplers in descriptor set layouts.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 427024bf2e ac/nir: Add support for planes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen dc917c8073 radv: Allow mixed src/dst aspects in copies.
e.g. COLOR + PLANE_2, as well COLOR + COLOR for multiplane images.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen b2cfa231d0 radv: Add support for image views with multiple planes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 65c4f612aa radv: Add ycbcr conversion structs.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen a837768857 radv: Support different source & dest aspects for planar images in blit2d.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 66507cc656 radv: Add single plane image views & meta operations.
Copies & clear of multiplane images is not allowed so we do not
have to handle that case.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 42d159f276 radv: Add multiple planes to images.
No functional changes. This temporarily uses plane 0 for
everything.

Long term plan is that only single plane images get to use
metadata like htile/dcc/cmask/fmask.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen d3225e533f radv: Add logic for multisample format descriptions.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen 09c4a911e5 radv: Add logic for subsampled format descriptions.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Caio Marcelo de Oliveira Filho 055f6281d4 intel/fs: Don't handle texop_tex for shaders without implicit LOD
These will be lowered by nir_lower_tex() with the
lower_tex_when_implicit_lod_not_supported, so don't need the extra
handling here.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-25 12:13:06 -07:00
Caio Marcelo de Oliveira Filho d5ac5d6e83 nir: Add option to lower tex to txl when shader don't support implicit LOD
We already add the LOD src, so go ahead and update the texop as well
when this option is set.

v2: Make it an option. (Rob Clark)

v3: Use a more concise name suggested by Jason.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-04-25 12:13:06 -07:00