Commit Graph

80359 Commits

Author SHA1 Message Date
Connor Abbott b6dc940ec2 nir: rename nir_foreach_block*() to nir_foreach_block*_call()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20 09:47:05 -07:00
Samuel Pitoiset 7143068296 nvc0: avoid tex read fault from compute shaders on GK110
After some investigation, it seems like that disabling the UNK02C4
command avoid a read fault with texelFetch() from a compute shader.

I have no clue on what this method actually does, but this avoid the
GPU to hang with basic-texelFetch.shader_test without introducing any
compute-related regressions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-20 18:28:47 +02:00
Jason Ekstrand 87a4fb516e i965/vec4: Always split uniforms in array_access_to_pull_constants
Normally, we split uniforms at the end but in Vulkan, we bail because we
don't want pull constants.  However, we still need them split because
pack_uniforms relies on it.

I really don't like this patch not because it doesn't work (it does) but
because now that we're using MOV_INDIRECT, uniform numbers and sizes don't
really matter anymore.  In the FS backend, uniform splitting and packing is
handled all at once (actual re-assignment of locations happens later) and
we really should do it that way in vec4 eventually as well.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20 09:15:01 -07:00
Jason Ekstrand b3f43822c7 i965/vec4: Use the correct offset for the swizzle shift in push constants
This was actually caught by Ken in review the first time around but somehow
didn't get fixed before the patches were pushed. :-(

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20 09:15:01 -07:00
Jason Ekstrand 9f16e170fe i965/vec4: Use nir_intrinsic_base in the load_uniform implementation
We shouldn't be reading the const_index directly

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20 09:15:01 -07:00
Jason Ekstrand f63a95080f anv/apply_dynamic_offsets: Provide a range on the load_uniform
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20 09:14:58 -07:00
Jason Ekstrand 35b758c378 anv/lower_push_constants: Stop treating scalar specially
All of the code that did something special based on vec4 vs. scalar is
bogus.  In the backend, everything is now in units of bytes and the vec4
backend can handle full std140 packing so we don't need to do anything
special anymore.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
2016-04-20 09:14:47 -07:00
Tim Rowley 3bbe8a09ea swr: fix resource backed constant buffers
Code was using an incorrect address for the base pointer.

v2: use swr_resource_data() utility function.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94979
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tested-by: Markus Wick <markus@selfnet.de>
2016-04-20 09:57:55 -05:00
Hans de Goede 2ac2ecdd6c nouveau: codegen: Add support for OpenCL global memory buffers
Add support for OpenCL global memory buffers, note this has only
been tested with regular load and stores and likely needs more work
for e.g. atomic ops.

Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-20 13:46:03 +02:00
Hans de Goede 61d52a5fb9 nouveau: codegen: Use FILE_MEMORY_BUFFER for buffers
Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only
apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for
OpenCL global buffers.

This commits changes the buffer code to use FILE_MEMORY_BUFFER at the
ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL
for use with OpenCL global buffers.

Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL
register file.

Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-20 13:46:03 +02:00
Jose Fonseca f02f4d09ce scons: Build dri_common_interop.c. 2016-04-20 12:41:24 +01:00
Marek Olšák 4fa3d35cc5 st/dri: implement the GL interop DRI extension (v2.2)
v2: - set interop_version
    - simplify the offset_after macro
v2.1: - use version numbers, remove offset_after
      - set "out_driver_data_written"
v2.2: - set buf_offset & buf_size for GL_ARRAY_BUFFER too
      - add whandle.offset to buf_offset
      - disable the minmax cache for GL_TEXTURE_BUFFER
2016-04-20 12:18:47 +02:00
Marek Olšák 37d3a26bd6 glx: implement GLX part of interop interface (v2)
v2: - use const
2016-04-20 12:18:47 +02:00
Marek Olšák b6eda70843 egl: implement EGL part of interop interface (v2)
v2: - use const
2016-04-20 12:18:47 +02:00
Marek Olšák 5e9ed261ed dri_interface: add interface for GL interop with other APIs (v2)
v2: - use const
2016-04-20 12:18:47 +02:00
Marek Olšák 6eeb729490 include/GL: add mesa_glinterop.h for OpenGL-OpenCL interop (v4.2)
v2: - use "enum" to define stuff
v3: - more comments, define MESA_GLINTEROP_UNSUPPORTED
v4: - add mesa_glinterop_device_info::interop_version
    - more comments
    - remove #define MESA_GLINTEROP_VERSION
    - use const for "in"
v4.1: - use version numbers for structures
      - add "out_driver_data_written"
v4.2: - buf_offset & buf_size affect GL_ARRAY_BUFFER too, this is required
        for sharing suballocations within a larger buffer
2016-04-20 12:15:41 +02:00
Nicolas Dufresne 8093990ef4 st/dri: Fix RGB565 EGLImage creation
When creating egl images we do a bytes to pixel conversion by deviding
by 4 regardless of the pixel format. This does not work for RGB565. In
this patch, we avoid useless conversion and use proper API when the
conversion cannot be avoided.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-20 17:55:30 +09:00
Nicolas Dufresne 4463f38766 st/dri: Factor out DRI2 to PIPE_FORMAT conversion
This code is already duplicated twice and will be useful again. This
will also help when adding formats.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-20 17:34:03 +09:00
Rob Clark 899bd63ace freedreno/a4xx: lower srgb in shader for astc textures
This *seems* like a hw bug, and maybe only applies to certain a4xx
variants/revisions.  But setting the SRGB bit in sampler view state
(texconst0) causes invalid alpha for ASTC textures.  Work around this
by doing the srgb->linear conversion in the shader instead.

This fixes 392 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb*

(The remaining fails seem to be a bug w/ ASTC + linear filtering, also
possibly a420.0 specific.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 17:14:04 -04:00
Rob Clark eddfc97709 nir/lower-tex: add srgb->linear lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-19 17:13:50 -04:00
Rob Clark eb00a0fc58 nir/builder: const'ify swiz param
No need for it not to be const, and lets caller declare it const if
desired.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-19 17:13:36 -04:00
Rob Clark 52ccc6349f nir/lower-tex: make options a local var
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:12:49 -04:00
Rob Clark d4ff42bd0a freedreno: cleanup fd_set_sampler_views
The separate FS/VS entrypoints are no longer used since a3ed98f.  So
just inline them.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:11:47 -04:00
Russell King fadfaa82c6 tgsi/lowering: improved lowering for LRP
Provide an improved lowering for LRP, which can be implemented in two
MAD instructions with a bit of rearranging of the equation, rather
than the literal implementation of two multiplies, an add and a
subtract.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:04:44 -04:00
Russell King 67da7dd98a tgsi/lowering: improved lowering for XPD
Improve XPD lowering to consume less instructions by using the
MAD instruction to perform the multiply and subtraction together.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:04:44 -04:00
Russell King 65460cf4c8 tgsi/lowering: add support for lowering TRUNC
Add support for lowering TRUNC using the following sequence:

	FRC tmpA, |src|
	SUB tmpA, |src|, tmpA
	CMP dst, -tmpA, tmpA

Note that this is incompatible with FRC lowering.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:04:44 -04:00
Russell King 23e870a888 tgsi/lowering: add support for lowering FLR and CEIL
Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD
instructions for GPUs that support FRC but not FLR or CEIL.  Since
these uses FRC, it is invalid to ask for FLR or CEIL to be lowered
along with FRC, so add an assert to catch this invalid configuration.

We also need to deal with FLR instructions emitted by the lowering
code.  Fix these up with the FRC+SUB equivalent when FLR lowering is
enabled.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:04:44 -04:00
Bas Nieuwenhuizen 464cef5b06 radeonsi: enable TGSI support cap for compute shaders
v2: Use chip_class instead of family.

v3: Check kernel version for SI.

v4: Preemptively allow amdgpu winsys for SI.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:31:23 +02:00
Bas Nieuwenhuizen 1f32d5d59f radeonsi: Consider input SGPR count for compute shader SGPR count.
si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then
recompute the rsrc1 register to use the new SGPR count.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:31:23 +02:00
Bas Nieuwenhuizen 6c833ba1ab radeonsi: Add CE synchronization for compute dispatches.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:31:23 +02:00
Bas Nieuwenhuizen e0b729c544 mesa/st: enable compute shaders if images are also supported
v2: Also depend on atomic counters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:32 +02:00
Bas Nieuwenhuizen 41d79bcbfa radeonsi: clean up compute flush
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:32 +02:00
Bas Nieuwenhuizen 7a92c08428 radeonsi: do not do two full flushes on every compute dispatch
v2: Add more CS_PARTIAL_FLUSH events.

Essentially every place with waits on finishing for pixel shaders
also has a write after read hazard with compute shaders.

Invalidating L2 waits implicitly on pixel and compute shaders,
so, we don't need a CS_PARTIAL_FLUSH for switching FBO.

v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2.

According to Marek the INV_GLOBAL_L2 events don't wait for compute
shaders to finish, so wait for them explicitly.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen e764ee13ae radeonsi: split setting graphics and compute descriptors
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 061ce9399a radeonsi: split texture decompression for compute shaders
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen e56514f631 radeonsi: update predicate condition for compute dispatches
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen c3083d841e radeonsi: implement TGSI compute dispatch
v2: - Use radeon_set_sh_reg_seq.
    - Set predicate bit for conditional rendering.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 1349dd16ff radeonsi: only emit compute shader state when switching shaders
v2: - Do check if anything changed earlier
    - Use emitted_program instead of emitted_bo to prevent
      shaders with shader->bo = NULL confusing the check
    - Use radeon_set_sh_reg*

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen ba1f66a73d radeonsi: rework compute scratch buffer
Instead of having a scratch buffer per program, have one per
context.

Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.

v2: Fix style issue.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 107f4d3538 radeonsi: do per cs setup for compute shaders once per cs
Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.

v2: - Use radeon_set_sh_reg_seq.
    - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 52d3584dec radeonsi: don't pass scratch buffer to user SGPRs
As far as I can see we use relocations for clover too.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 422a19f76f radeonsi: split input upload off from si_launch_grid
Also uses a dynamically allocated buffer using u_upload_alloc.
The old buffer per program approach required serializing all
dispatches of the same program.

v2: - Clarified commit message.
    - Use radeon_set_sh_reg_seq.
    - Also upload input buffer for clover kernels, even when
      input_size is 0, as it contains grid parameters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 898298efc9 radeonsi: implement TGSI compute shader creation
v2: Moved scratch_enabled initialization after compile.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 85fd7817ee radeonsi: update shader count for compute shaders
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen da88c2a8e8 radeonsi: set maximum work group size based on block size
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen b082147b78 radeonsi: implement shared atomics
v2: - Use single region
    - Use get_memory_ptr

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 8acf3e501b radeonsi: implement shared memory load/store
v2: - Use single region
    - Combine address calculation

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen 84a6761ae3 radeonsi: add shared memory
Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.

v2: - Use ctx->i8.
    - Dropped null-check for declare_memory_region.
    - Changed memory region array to single region.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen 753a3e472b radeonsi: lower compute shader arguments
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen 008d977d01 radeonsi: Use CE for all descriptors.
v2: Load previous list for new CS instead of re-emitting
    all descriptors.

v3: Do radeon_add_to_buffer_list in si_ce_upload.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00