Commit Graph

115071 Commits

Author SHA1 Message Date
Dave Airlie c0521ecffb llvmpipe: enable compute shaders if LLVM has coroutines
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 6453a22612 llvmpipe: add local memory allocation path
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 4e70970507 llvmpipe: add compute shader parameter fetching support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 0b51e73de2 llvmpipe: add compute shader images support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 45a8cf95f2 llvmpipe: add ssbo support to compute shaders
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 6ea8e9b415 llvmpipe: add compute sampler + sampler view support.
This is ported from the fragment shader code.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 4ca40cc3dc llvmpipe: add support for compute constant buffers.
This is mostly ported from the fragment shader code.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 775fa81d7b llvmpipe: add compute pipeline statistics support.
This just adds the CS invocations counter.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 50fde5b208 llvmpipe: add grid launch
This adds the dispatch code. It creates a job for the number
of blocks in the grid, and dispatches them to the threadpool
implementation. The threadpool then calls the JIT code to
execute the coroutines.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie b320830bbd llvmpipe: add compute shader generation.
This creates the coroutine execution environment and the
main compute shaders that get executed inside it.

Each compute shader block is executed in it's own coroutine
execution shader, which each "thread" being a coroutine executed
inside it in sequence.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 6ea41df94c llvmpipe: introduce variant building infrastrucutre.
This doesn't actually build any of the shaders yet, but just
builds up the framework necessary to start building the shaders
and variants.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie fc01fafdbc llvmpipe: introduce new state dirty tracking for compute.
Compute doesn't share dirty state with the fragment pipeline
so create a separate path for it.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie a6f6ca37c8 llvmpipe: add initial shader create/bind/destroy variants framework.
This is mostly a port of the fragment shader framework

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie a792c5ae3e llvmpipe: add compute debug option
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 25f46ae9aa gallivm: add compute jit interface.
This adds the jit interface for compute shaders, it's based
on the fragment shader one.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 3879f69b50 llvmpipe: add initial compute state structs
These mirror the fragment shader structs, this is just a framework.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie add0b151f5 llvmpipe: introduce compute shader context
The compute shader will need it's own context like the frag shader
has, this just introduces the framework struct and allocates/frees
for it in the right places.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 83597ad3f2 gallivm: add barrier support for compute shaders.
When the code is executing an hits a barrier, it will suspend
the coroutine and return control to the coroutine dispatcher.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 1b24e3ba75 llvmpipe: add compute threadpool + mutex
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In order to efficiently run a number of compute blocks, use
a threadpool that just allows for jobs with unique sequential
ids to be dispatched.
2019-09-04 15:22:20 +10:00
Dave Airlie e5bf6b7013 gallivm: add support for compute shared memory
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie db6c78f9c8 gallivm: add new compute related intrinsics
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 3312bed7b0 llvmpipe: reogranise jit pointer ordering
In order to share the texture/image/sampler code with compute
shaders we need to reorg them to be at the front of context
same as draw does for vs/gs sharing.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie d32690b43c gallivm: add coroutine pass manager support
coroutines require a proper pass manager, so add the passes
to the correct places

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 9cf1340e4f gallivm: add coroutine support files to gallivm.
These wrap the coroutine intrinsics and also add some higher
level wrappers around coroutine begin, end and suspend procedures

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie f3f0cbf4f4 gallivm/flow: add counter reset for loops
This allows the counter value to be forced to a certain value

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 6b3c6b91a8 llvmpipe: enable fb no attach
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Kenneth Graunke f8887909c6 iris: Report correct number of planes for planar images
We were only handling the modifiers case and not counting the number of
planes in actual planar images.

Fixes Piglit's ext_image_dma_buf_import-export.

Fixes: fc12fd05f5 ("iris: Implement pipe_screen::resource_get_param")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111509
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-09-03 21:55:23 -07:00
Ilia Mirkin 32d458fdff teximage: ensure that Tex*SubImage* checks format
We were previously not doing at least some of the checks. This uses the
same logic that is used in glTexImage*.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-04 00:35:45 -04:00
Jan Beich 8e92ce9ba5 gallium/hud: add CPU usage support for DragonFly/NetBSD/OpenBSD
Each BSD has slightly different sysctl for retrieving per-CPU times.
FreeBSD returns long while NetBSD returns uint64_t. On OpenBSD return
type differs between summation and per-CPU times. DragonFly is
compatible with FreeBSD.

Signed-off-by: Jan Beich <jbeich@FreeBSD.org>
2019-09-03 22:53:15 -04:00
Roman Stratiienko ef621a73f7 lima: Return fence unconditionally
Based on the vc4 implementation.
Fixes Android RenderEngine::flush() routine:
android.googlesource.com/platform/frameworks/native/+/refs/tags/android-o-mr1-iot-release-smart-clock-fcs/services/surfaceflinger/RenderEngine/RenderEngine.cpp#225

Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-09-04 00:32:04 +00:00
Vasily Khoruzhick 1c1890fa70 lima/ppir: clone uniforms and load_coords into each successor
Try more aggressive approach with cloning uniform and coord loads.

Uniform load can be inserted into any instruction, so let's do that. ARM site
claim that penalty for cache miss is one clock, so we don't lose anything if
we merge it into instruction that uses the result. As side effect we can also
pipeline it and thus decrease reg pressure.

Do the same for varyings that hold texture coords, but for different reason:
looks like there's a special path for coords that increases precision if
varying that holds it is pipelined. If we don't pipeline it and load coords
from a register its precision is fp16 and thus only 10 bits which is not enough
to accurately sample textures of size 1024 or larger.

Since instruction can hold only one uniform load and one varying load,
node_to_instr now creates a move using helper introduced in previous commit if
slot is already taken. As side effect of this change we can also try to
pipeline texture loads and create a move if attempt fails.

Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-04 00:02:13 +00:00
Vasily Khoruzhick e23fd2c375 lima/ppir: don't assume that load coords gets value from register
It can load value from varying directly as well. Also load_regs is the
only op that has a source, so add src_num field to load node and set it
accordingly.

Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-04 00:02:13 +00:00
Vasily Khoruzhick bd77d19300 lima/ppir: add common helper for creating movs
Introduce common helper for creating movs to avoid code duplication

Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-04 00:02:13 +00:00
Eric Engestrom 7659c6197f nir: fix memleak in error path
Fixes: 2cf59861a8 ("nir: Add partial redundancy elimination for compares")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-09-04 00:31:53 +01:00
Eric Engestrom c4969b0a25 freedreno/drm-shim: fix mem leak
Fixes: 494ecef6b4 ("freedreno: Add support for drm-shim.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-04 00:18:37 +01:00
Eric Engestrom 7abf65aedc anv: fix format string in error message
Fixes: 9775894f10 ("anv: Move size check from anv_bo_cache_import() to caller (v2)")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-04 00:13:20 +01:00
Eric Engestrom 1667360f7d util/os_file: fix double-close()
Fixes: 955c63d364 ("util/os_file: resize buffer to what was actually needed")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-09-04 00:11:51 +01:00
Eric Engestrom 43d470404c egl: fix deadlock in malloc error path
Fixes: cb0980e69a ("egl: move alloc & init out of _eglBuiltInDriver{DRI2,Haiku}")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-09-04 00:10:18 +01:00
Eric Engestrom 3afe9d798a ttn: fix 64-bit shift on 32-bit `1`
Fixes: 4d0b2c7aaa ("ttn: Update shader->info as we generate code.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-09-04 00:01:08 +01:00
Rob Clark 1ef459297c freedreno/ir3: use uniform base
When lowering from ubo, use the constant base field in the load_uniform
instruction for the constant part of the offset.  Doesn't change much
for constant indexing, but this will help for indirect indexing because
constant-folding can't completely clean up the result.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-03 14:10:57 -07:00
Rob Clark 305bcdf992 freedreno/drm: fix 64b iova shifts
Should shift before splitting 64b iova into dwords

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-03 14:10:57 -07:00
Rob Clark 5ccd5871ed nir: remove unused constant_fold_state
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-03 14:10:57 -07:00
Eric Anholt 79a5ebe045 freedreno: Fix the type of single-component scaled vertex attrs.
This looks like clear copy-and-pasteos, and fixes:

dEQP-GLES2.functional.draw.random.40

(on A307 and A630, both tested in the new CI farm)

Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-09-03 19:34:09 +00:00
Connor Abbott f3e978db4d radeonsi/nir: Remove uniform variable scanning
We can get all the information we need from NIR. It's slightly less
accurate, but radeonsi doesn't use the extra information. The old code
also overcounted atomic counters, which led to problems when everything
was used at once.

Fixes KHR-GL45.compute_shader.resources-max.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-03 15:55:02 +02:00
Connor Abbott 96c2a2832f ttn: Fill out more info fields
We'll use these in radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-03 15:54:57 +02:00
Connor Abbott dcc64fcfed nir: Fix num_ssbos when lowering atomic counters
Otherwise it's impossible to know the maximum SSBO index for both
internal TGSI shaders from TTN (which don't have any notion of atomic
counters and no offset) as well as shaders from GLSL.

I fixed everything I could find while grepping for num_ssbos and
num_abos, which hopefully is everything (iris was the only user I could
find that uses it in a meaningful way).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-03 15:54:54 +02:00
Connor Abbott 2abf62d348 ac/nir: Fix gather4 integer wa with unnormalized coordinates
This adds a bit of unneccesary code on radeonsi, since whether
unnormalized coordinates are used is known at compile time with GL, but
I wasn't sure if it was worth the few instructions to plumb everything
through, especially for something so rare -- my shader-db doesn't have
any instances where this changes anything.

Fixes CTS tests I created at
https://github.com/cwabbott0/VK-GL-CTS/tree/unnorm-gather-tests

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-03 13:50:54 +00:00
Connor Abbott c63ccf90df ac/nir: Rewrite gather4 integer workaround based on radeonsi
The workaround was originally written based on amdgpu-pro traces, but
since then radeonsi has got its own slightly different version. Use the
radeonsi version instead, to be consistent and because it'll be slightly
more convenient for handling unnormalized coordinates.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-03 13:50:54 +00:00
Eric Engestrom 5f7d90f2ff egl: warn user if they set an invalid EGL_PLATFORM
Technically, the user might have set EGL_DISPLAY instead of
EGL_PLATFORM, but since the former is deprecated let's just mention the
latter in the warning message.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-03 14:41:43 +01:00
Alyssa Rosenzweig 5cdfccf8a6 panfrost: Remove panfrost_upload
This routine was made obsolete over a series of reworks of memory
allocation; Tomeu's changes to shader memory allocation finally made
this unused as cppcheck noted.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-09-03 13:55:29 +02:00