Commit Graph

88040 Commits

Author SHA1 Message Date
Marek Olšák b7699ce07c winsys/amdgpu: fix a race condition between fence updates and IB submissions
The CS thread is needed to ensure proper ordering of operations and can't
be disabled (without complicating the code).

Discovered by Nine CSMT, which ended up in a deadlock.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák ece6e1f658 radeonsi: add TC L2 prefetch for shaders and VBO descriptors
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák a131dacb14 radeonsi: add CP DMA flags for greater control over synchronization
for L2 prefetch

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 8ac1715d02 radeonsi: cleanly communicate which CP DMA packet is first
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 2b621c47aa gallium/radeon: add new HUD query num-SDMA-IBs
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 6b8a371e00 gallium/radeon: rename the num-ctx-flushes query to num-GFX-IBs
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 5871ebd7f1 radeonsi: add HUD queries for cache flush stats
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák aac07bb79c radeonsi: don't count fast clears and prefetches into CP DMA stats
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 3b98a5dc47 radeonsi: don't wait for compute shaders in texture_barrier
it doesn't interact with compute shaders in any way

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 4b93ba542c radeonsi: assume that a TES without POSITION precedes GS
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 53648050a5 radeonsi: unduplicate VS color export code
it's exactly the same as the other ones

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák 42920c0fb9 radeonsi: clean up more HAVE_LLVM #ifdefs
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák a8374c3d22 gallium/radeon: clean up HAVE_LLVM #ifdefs in r600_get_llvm_processor_name
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Kenneth Graunke 2138347a45 i965: Properly flush in hsw_pause_transform_feedback().
Fixes a number of transform feedback tests when run with Linux 4.8,
which allows us to use the MI_LOAD_REGISTER_REG command, at which point
we started using this new broken path.

ES3-CTS.functional.transform_feedback.array_element.interleaved.lines.*
and Piglit's arb_transform_feedback2/draw-auto are both fixed by this
patch, for example.

Thanks to Chris Wilson for catching this mistake!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99030
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-01-06 12:01:53 -08:00
Kenneth Graunke 4295af646f i965: Fix texturing in the vec4 TCS and GS backends.
We were failing to zero m0.2 of the sampler message header for TCS and
GS messages in the simple case.  fs_generator has done this for about
a year now, but we missed it in vec4_generator.

Fixes ES31-CTS.core.texture_cube_map_array.sampling,
GL45-CTS.texture_cube_map_array.sampling, and many
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler subtests:
- dynamically_uniform.tessellation_control.isampler3d
- dynamically_uniform.tessellation_control.isamplercube
- dynamically_uniform.tessellation_control.sampler2d
- dynamically_uniform.tessellation_control.usamplercube
- dynamically_uniform.tessellation_control.sampler2darray
- dynamically_uniform.tessellation_control.isampler2darray
- dynamically_uniform.tessellation_control.usampler3d
- dynamically_uniform.tessellation_control.usampler2darray
- dynamically_uniform.tessellation_control.usampler2d
- dynamically_uniform.tessellation_control.sampler3d
- dynamically_uniform.tessellation_control.samplercube
- dynamically_uniform.tessellation_control.isampler2d
- uniform.tessellation_control.isampler3d
- uniform.tessellation_control.isamplercube
- uniform.tessellation_control.usampler2d
- uniform.tessellation_control.usampler3d
- uniform.tessellation_control.sampler2darray
- uniform.tessellation_control.isampler2darray
- uniform.tessellation_control.usampler2darray
- uniform.tessellation_control.sampler2d
- uniform.tessellation_control.usamplercube
- uniform.tessellation_control.sampler3d
- uniform.tessellation_control.samplercube
- uniform.tessellation_control.isampler2d

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06 11:49:53 -08:00
Tim Rowley c93efb0a4f swr: [rasterizer core] rename OutputMerger functions
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:05:08 -06:00
Tim Rowley fa7c5e242f swr: [rasterizer core] fix SIMD16 Transpose_16_16
Fix incorrect swizzling in SIMD16 Transpose_16_16 breaking the
two-channel 16-bpc formats like R16G16_FLOAT.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:05:02 -06:00
Tim Rowley e62b6d2f0f swr: [rasterizer core] fix SIMD16 output merger
Honor the colorHottileEnable mask when accessing colorBuffer pointers.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:04:56 -06:00
Tim Rowley 1a77e0c48d swr: [rasterizer core] fix SIMD16 PackTraits pack() and unpack()
Fix routines for 8-bit and 16-bit formats used by optimized tile store.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:04:50 -06:00
Tim Rowley bd22c3d411 swr: [rasterizer core] fix SIMD16 transpose functions
Fixed Transpose_16 methods of following formats:

Transpose8_8_8_8
Transpose8_8
Transpose32_32
Transpose16_16_16_16
Transpose16_16_16
Transpose16_16

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:04:41 -06:00
Tim Rowley e6eede81af swr: [rasterizer core] whitespace adjustments
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:04:28 -06:00
Kenneth Graunke a4d6f4d954 i965: Don't set EmitNoMainReturn.
A while ago, we stopped using Luca's GLSL IR lower_jumps pass in favor
of nir_lower_returns().  Marek's commit d3cb79e043
put it in do_common_optimization, which resulted in us calling it again.

Dropping the EmitNoMainReturn setting makes us skip that pass again.

Apparently that pass doesn't work properly, because this fixes Piglit's
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99287
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-05 23:15:39 -08:00
Eric Anholt 69da8c32c7 vc4: Rewrite T image handling based on calling the LT handler.
The T images are composed of effectively swizzled-around blocks of LT (4x4
utile) images, so we can reduce the t_utile_address() calls by 16x by
calling into the simpler LT loop.

This also adds support for calling down with non-utile-aligned
coordinates, which will be part of lifting the utile alignment requirement
on our callers and avoiding the RMW on non-utile-aligned stores.

Improves 1024x1024 TexSubImage by 2.55014% +/- 1.18584% (n=46)
Improves 1024x1024 GetTexImage by 2.242% +/- 0.880954% (n=32)
2017-01-05 17:19:54 -08:00
Eric Anholt 3a3a0d2d6c vc4: Move the utile_width/height functions to header inlines.
I want these inlined in the callers, particularly with the tiling
changes coming up, but we're not building with lto so some caller
would suffer.
2017-01-05 17:19:54 -08:00
Eric Anholt 6cf9ff8a6c vc4: Make the load/store utile functions static.
They don't have any other callers outside of this file, and I'm hoping
they get inlined soon.
2017-01-05 17:19:54 -08:00
Eric Anholt e64b1169d3 vc4: Simplify the load/store utile functions.
They now have less of a dependency on the cpp, and don't have to do a
divide.

Hacking up mesa-demos teximage to do only one subtest and not draw
points, I saw 1024x1024 glTexSubImage2D() improve by 4.86939% +/-
1.40408% (n=30) and glGetTexImage() by 2.18978% +/- 0.140268% (n=5).
2017-01-05 17:19:48 -08:00
Eric Anholt 7b8c67b3cc vc4: Reuse a list function to simplify bufmgr code. 2017-01-05 16:23:32 -08:00
Eric Anholt ebf33e577a vc4: Flush the job early if we're referencing too many BOs.
If we get up toward 256MB (or whatever the CMA area size is),
VC4_GEM_CREATE will start throwing errors.  Even if we don't trigger
that, when we flush the kernel's BO allocation for the CLs or bin
memory may end up throwing an error, at which point our job won't get
rendered at all.

Just flush early (half of maximum CMA size) so that hopefully we never
get to that point.
2017-01-05 16:23:32 -08:00
Timothy Arceri 076ab157ff st/mesa/glsl: move SamplerTargets to gl_program
This will help allow us to simplify the handling of samplers by
storing them in a single location rather than duplicating them in
both gl_linked_shader and gl_program.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri 937523971f st/mesa/glsl: set SamplersUsed directly in gl_program
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri 53a509723f mesa/glsl: set sampler units directly in gl_program
Now that we create gl_program earlier there is no need to mess about
copying things to gl_linked_shader then to gl_program.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri 7cc61cf706 mesa: simplify sampler setting code
There is no need to loop over active samplers the code above this
would have already exited if the sampler was inactive, or errored
if the count was larger than the uniforms array size.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri 4807a83da0 mesa/glsl: set num_textures per stage directly in shader_info
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri c46a630000 mesa: make _CurrentFragmentProgram a gl_program struct pointer
Making this point to a gl_program struct rather than a gl_shader_program
struct will allow use to later also make the CurrentProgram array hold
gl_program structs which in turn will allow for code simpilifcation.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri 6e3f6097c9 i965: stop passing gl_shader_program to the precompile and codegen functions
We no longer need it.

While we are at it we mark the vs, gs, and wm codegen functions as static.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri 5ceedefd6c mesa/glsl: remove hack to reset sampler units to zero
Now that we have the is_arb_asm flag we can just skip the
initialisation.

V2: remove hack from standalone compiler where it was never
needed since it only compiles glsl shaders.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri 238486884e i965: make use of new is_arb_asm flag
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri f584f38214 st/mesa/glsl: add new is_arb_asm flag in gl_program
Set the flag via the _mesa_init_gl_program() and NewProgram()
helpers.

In i965 we currently check for the existance of gl_shader_program
to decide if this is an ARB assembly style program or not.

Adding a flag makes the code clearer and will help removes a
dependency on gl_shader_program in the i965 codegen functions.

Also this will allow use to skip initialising sampler units for
linked shaders, we currently memset it to zero again during linking.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri 2784128398 i965: pass gl_program directly to brw_compile_tes()
This is the only thing we use from gl_shader_program so pass it directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri 2a4d169735 i965: stop passing gl_shader_program to brw_nir_setup_glsl_uniforms()
We can now just get the data needed from the gl_shader_program_data
pointer in gl_program.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri d3b2ee6b49 i965: pass gl_program to brw_upload_ubo_surfaces()
There is no need to pass gl_linked_shader anymore.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri 9ca14f583c i965: stop passing gl_shader_program to brw_assign_common_binding_table_offsets()
We now get everything we need directly from gl_program so there is
no need for this.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri f5bc127b2f st/mesa/glsl/i965: move ShaderStorageBlocks to gl_program
Having it here rather than in gl_linked_shader allows us to simplify
the code.

Also it is error prone to depend on the gl_linked_shader for programs
in current use because a failed linking attempt will free infomation
about the current program. In i965 we could be trying to recompile
a shader variant but may have lost some required fields.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri f62eb6c7eb st/mesa/glsl/i965: set num_ssbos directly in shader_info
Here we also remove the duplicate field in gl_linked_shader and always
get the value from shader_info instead.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri 0e7eec1ab5 st/mesa/glsl/i965: move per stage UniformBlocks to gl_program
This will help allow us to store pointers to gl_program structs in the
CurrentProgram array resulting in a bunch of code simplifications.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri b792c38979 st/mesa/glsl/i965: set num_ubos directly in shader_info
This also removes the duplicate field in gl_linked_shader, and
gets num_ubos from shader_info instead.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri a1da57c19c st/mesa/glsl/i965: move ImageUnits and ImageAccess fields to gl_program
Having it here rather than in gl_linked_shader allows us to simplify
the code.

Also it is error prone to depend on the gl_linked_shader for programs
in current use because a failed linking attempt will free infomation
about the current program. In i965 we could be trying to recompile
a shader variant but may have lost some required fields.

We drop the memset on ImageUnits because gl_program is already
created using rzalloc().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00
Timothy Arceri 3d2485f011 i965: get InfoLog and LinkStatus via the pointer in gl_program
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00
Timothy Arceri be9a6a7eb7 i965: get shared_size from shader_info rather than gl_shader_program
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00
Timothy Arceri 234211ec8d i965: stop depending on gl_shader_program for brw_compute_vue_map() params
This removes another dependency on gl_shader_program from the codegen functions,
this will help allow us to use gl_program for the CurrentProgram array rather
than gl_shader_program.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00