Commit Graph

83359 Commits

Author SHA1 Message Date
Marek Olšák dea6fdadca winsys/radeon: use pb_cache buckets for fewer pb_cache misses
This makes Bioshock Infinite with deferred flushing 2.2% faster.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák 8d5944199d gallium/pb_cache: reduce the number of pointer dereferences
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák 3cdc0e133f gallium/pb_cache: divide the cache into buckets for reducing cache misses
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák fec7f74129 gallium/pb_cache: check parameters that are more likely to fail first
This makes Bioshock Infinite with deferred flushing 2% faster.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák 2596ae2b6e radeonsi: emit PS exports last
This effectively removes s_waitcnt instructions after FP16 exports.

Before:

    v_cvt_pkrtz_f16_f32_e32 v0, v0, v1   ; 5E000300
    v_cvt_pkrtz_f16_f32_e32 v1, v2, v3   ; 5E020702
    exp 15, 0, 1, 0, 0, v0, v1, v0, v0   ; F800040F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v4, v5   ; 5E000B04
    v_cvt_pkrtz_f16_f32_e32 v1, v6, v7   ; 5E020F06
    exp 15, 1, 1, 0, 0, v0, v1, v0, v0   ; F800041F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v8, v9   ; 5E001308
    v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A
    exp 15, 2, 1, 0, 0, v0, v1, v0, v0   ; F800042F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C
    v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E
    exp 15, 3, 1, 1, 1, v0, v1, v0, v0   ; F8001C3F 00000100
    s_endpgm                             ; BF810000

After:

    v_cvt_pkrtz_f16_f32_e32 v0, v0, v1   ; 5E000300
    v_cvt_pkrtz_f16_f32_e32 v1, v2, v3   ; 5E020702
    v_cvt_pkrtz_f16_f32_e32 v2, v4, v5   ; 5E040B04
    v_cvt_pkrtz_f16_f32_e32 v3, v6, v7   ; 5E060F06
    exp 15, 0, 1, 0, 0, v0, v1, v0, v0   ; F800040F 00000100
    v_cvt_pkrtz_f16_f32_e32 v4, v8, v9   ; 5E081308
    v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A
    exp 15, 1, 1, 0, 0, v2, v3, v0, v0   ; F800041F 00000302
    v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C
    v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E
    exp 15, 2, 1, 0, 0, v4, v5, v0, v0   ; F800042F 00000504
    exp 15, 3, 1, 1, 1, v6, v7, v0, v0   ; F8001C3F 00000706
    s_endpgm                             ; BF810000

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák b2b45cecef radeonsi: set optimal settings in COMPUTE_RESOURCE_LIMITS
ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák ad70c3954b radeonsi: really wait for the second EOP event and not the first one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák 1a1cc67edd gallium/radeon: remove RADEON_FLUSH_KEEP_TILING_FLAGS flag
always set

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Ian Romanick 0b626d7524 nir/algebraic: Optimize fabs(u2f(x))
I noticed this when I tried to do frexp(float(some_unsigned)) in the
ir_unop_find_lsb lowering pass.  The code generated for frexp() uses
fabs, and this resulted in an extra instruction.  Ultimately I ended up
not using frexp.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:30 -07:00
Ian Romanick 94296be276 st/mesa: Enable MESA_shader_integer_functions on all GLSL 1.30 platforms
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:30 -07:00
Ian Romanick 7cb49b1bd7 i965: Enable MESA_shader_integer_functions on all GLSL 1.30 platforms
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick 5726e57f13 i965: Don't lower uaddCarry and usubBorrow in both GLSL IR and NIR
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick d7a47a76e0 i965: Update assertion to account for Gen < 7
Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the
assertion assumed the Gen7+ accumulator rules.  A future patch will
allow this instruction on at least Gen6, so update the assertion.

v2: Use get_lowered_simd_width instead of open coding it.  Suggested by
Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
2016-07-19 12:19:29 -07:00
Ian Romanick 3e7cebc8da i965: Use LZD to implement nir_op_find_lsb on Gen < 7
v2: Rebase on changes to previous two patches.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick c2019c6c26 i965: Use LZD to implement nir_op_ifind_msb on Gen < 7
v2: Retype LZD source as UD to avoid potential problems with 0x80000000.
Suggested by Matt.  Also update comment about problem values with
LZD(abs(x)).  Suggested by Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick de20086eed i965: Use LZD to implement nir_op_ufind_msb
This uses one less instruction.

v2: Move emit_find_msb_using_lzd out of the visitor classes.  Suggested
by Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick 26c7f04d4a i965: Always enable GL_ARB_shading_language_packing
With the existing lowering passes, the functions from this extension
become a bunch of bit twiddling operations that have always been
supported.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick 4b2b6d4d4d i965: Move enable of EXT_shader_integer_mix
This extension does not depend on the Gen.  It only depends on the
availability of GLSL 1.30.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick a2379e44aa glsl: Add lowering pass for ir_bin_imul_high
This isn't the lowering pass you want.  Most GPUs that can support GLSL
1.30 have a multiply unit that can do something more interesting than
32x32->32.  Many have 32x16->48.  Any GPU that does, should do the
lowering in the backend.  This is just the thing that will always work.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick 1b5477668a glsl: Add lowering pass for ir_unop_find_msb
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick 2a381a3c73 glsl: Add lowering pass for ir_unop_find_lsb
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick ad9acb19c3 glsl: Add lowering pass for ir_unop_bitfield_reverse
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick 3079dcb00c glsl: Add lowering pass for ir_quadop_bitfield_insert
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick 4d6d219b58 glsl: Add lowering pass for ir_triop_bitfield_extract
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick 7340be8a01 glsl: Add lowering pass for ir_unop_bit_count
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick 806add360f MESA_shader_integer_functions: Allow new function overload matching rules
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick 90537e1a0e MESA_shader_integer_functions: Allow implicit int->uint conversions
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick 65b0346fdb MESA_shader_integer_functions: Expose new built-in functions
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick 15c4ae461d MESA_shader_integer_functions: Boiler plate extension tracking
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick 91482ef226 MESA_shader_integer_functions: Add extension specification
v2: Fix typo in #extension line noticed by Ken.

v3: Update spec status.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:15 -07:00
Samuel Pitoiset 9c63224540 gm107/ir: make use of ADD32I for all immediates
ADD only allows to emit 19-bits immediates.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-07-19 18:07:15 +02:00
Samuel Pitoiset 0904a2ba97 gm107/ir: add missing NEG modifier for IADD32I
Like FADD32I, the NEG modifier of src0 is at position 56.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-07-19 18:07:10 +02:00
Andreas Boll c482decd4d ddebug: Fix trivial typo in stderr message
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
2016-07-19 16:04:40 +02:00
Andreas Boll d66cb7c84f configure.ac: Use ${datarootdir} for --with-vulkan-icddir help string too
The help string wasn't updated in cbc37f7.

Fixes: cbc37f7 ("anv: install the intel_icd.json to ${datarootdir} by
default")

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-07-19 16:04:01 +02:00
Eric Engestrom 8ba46fbd9e vl: fix memory leak
CovID: 1363008
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-19 12:41:00 +02:00
Boyuan Zhang 60c7450f16 vl: add entry point
Add entrypoint to distinguish H.264 decode and encode. For example, in patch
5/11 when is calling "VaCreateContext", "pps" and "sps" shouldn't be allocated
for H.264 encoding. So we need to use the entry_point to determine this is
H.264 decode or H.264 encode. We can use config to determine the entrypoint
since config_id is passed to us for VaCreateContext call. However, for
VaDestoyContext call, only context_id is passed to us. So we need to know the
entrypoint in order to not free the pps/sps for encoding case.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-19 12:36:46 +02:00
Ilia Mirkin ed9dd3bcd9 nv50,nvc0: srgb rendering is only available for rgba/bgra
Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already
didn't have the bind flags). This makes the state tracker pick a
different format when rendering is required, or mark the fb as
incomplete. This fixes:

  bin/getteximage-formats init-by-clear-and-render -auto -fbo
  bin/getteximage-formats init-by-rendering -auto -fbo

which previously ran into srgb-encoding differences.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-07-18 20:04:17 -04:00
Ilia Mirkin 8e7893eb53 nvc0: add support for BGRA8 images
This is useful for pbo downloads, which are now accelerated with images.
BGRA8 is a moderately common format to do that in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-18 20:04:17 -04:00
Jason Ekstrand 905d7dc4d1 i965: Skip update_texture_surface when the plane doesn't exist
Thanks to rebase fail, recent surface state changes (commits 7e951cd56,
8521ce1a7, and 69c0dc5c53) effectively reverted 727a9b2493 and 367cf3a2e3
which was unintentional.  This should bring it back.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-07-18 16:44:29 -07:00
Timothy Arceri cd5cbf0f6b glsl: use linked shaders rather than compiled shaders
At this point there is no reason not to be using the linked shaders,
using the linked shaders should be faster and will make things simpler
for upcoming shader cache work.

The previous variable name suggests the linked shaders were intended
to be used here anyway.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-07-19 09:42:00 +10:00
Lars Hamre 198074a41c The extension is already exposed, this simply marks it as done.
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-07-19 01:20:27 +02:00
Anuj Phogat 22935a3040 docs: Fix typo in extension name
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-18 15:53:24 -07:00
Anuj Phogat 7832e18879 docs: Add support for GL_KHR_texture_compression_astc_sliced_3d
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-18 15:44:18 -07:00
Anuj Phogat c7b787ef90 Revert "docs: Mark KHR_texture_compression_astc_sliced_3d done on i965"
This reverts commit 82f8c23950.

KHR_texture_compression_astc_sliced_3d is not a requirement for
GLES 3.2.

Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>\
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-18 15:43:58 -07:00
Anuj Phogat 82f8c23950 docs: Mark KHR_texture_compression_astc_sliced_3d done on i965
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-07-18 14:39:54 -07:00
Anuj Phogat ac0eb36d8e i965/gen9: Enable KHR_texture_compression_astc_sliced_3d
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-07-18 14:39:54 -07:00
Anuj Phogat 15dea5ca82 mesa: Add the infrastructure for KHR_texture_compression_astc_sliced_3d
V2: Drop the changes to gl.xml.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-07-18 14:39:54 -07:00
Christian König 3e1ad846f9 radeon/uvd: add session context buffer for polaris 10/11 v2
This way we have unlimited UVD sessions.

v2: only enable it when kernel supports it as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-07-18 17:13:17 +02:00
Leo Liu 134d6e4e4f vl/dri3: fix a memory leak from front buffer
Inspired by fix for mem leak of vdpau interop, resource_from_handle
set texture reference count, that need to be decreased and released,
recall there is a similar case for DRI3, that is with VA-API glx
extension, there is temporary TFP(texture from pixmap), we target it
through dma-buf. leak happens when without count down the reference.

Checked and found with mpv vo=opengl case, there only one static TFP,
the leak happens once, but for totem player using gstreamer VA-API glx,
the dynamic TFP for each frame, so leak quite a bit.

This fixes mem leak for mpv and totem.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-18 09:20:40 -04:00
Iago Toral Quiroga 0f2516d88f i965/tes/scalar: fix 64-bit indirect input loads
We totally ignored this before because there were no piglit tests for
indirect loads in tessellation stages with doubles.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-18 09:53:51 +02:00