Commit Graph

70869 Commits

Author SHA1 Message Date
Dave Airlie 7e5064360c radeonsi: add support for viewport array (v3)
This isn't pretty and I'd suggest it the pm4 interface builder
could be tweaked to do this more efficently, but I'd need
guidance on how that would look.

This seems to pass the few piglit tests I threw at it.

v2: handle passing layer/viewport index to fragment shader.
fix crash in blit changes,
add support to io_get_unique_index for layer/viewport index
update docs.
v3: avoid looking up viewport index and layer in es (Marek).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-27 00:24:07 +01:00
Kenneth Graunke 35d8379304 i965/fs: Fix ir_txs in emit_texture_gen4_simd16().
We were not emitting the LOD, which led to message lengths of 1 instead
of 3.  Setting has_lod makes us emit the LOD, but I had to make changes
to avoid emitting the non-existent coordinate as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91022
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-06-26 15:57:03 -07:00
Ilia Mirkin ad62ec8316 nv50/ir: propagate modifier to right arg when const-folding mad
An immediate has to be the second arg of an ADD operation. However we
were mistakenly propagating the modifier of the non-folded value to the
folded immediate argument.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91117
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-06-26 18:42:29 -04:00
Boyan Ding 052b3d4e2f egl_dri2: Remove trailing whitespaces
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-06-26 17:05:21 +00:00
Neil Roberts 3cf90bb183 i965/skl: Fix aligning mt->total_width to the block size
brw_miptree_layout_2d tries to ensure that mt->total_width is a
multiple of the compressed block size, presumably because it wouldn't
be possible to make an image that has a fraction of a block. However
it was doing this by aligning mt->total_width to align_w. Previously
align_w has been used as a shortcut for getting the block width
because before Gen9 the block width was always equal to the alignment.
Commit 4ab8d59a2 tried to fix these cases to use the block width
instead of the alignment but it missed this case.

I think in practice this probably won't make any difference because
the buffer for the texture will be allocated to be large enough to
contain the entire pitch and libdrm aligns the pitch to the tile width
anyway. However I think the patch is worth having to make the
intention clearer.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-06-26 17:02:22 +01:00
Matt Turner 404a90b827 mesa: Enable subdir-objects globally.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-26 12:55:25 +01:00
Emil Velikov 229450520a mesa: fold duplicated GL/GL_CORE/GLES3 entry in get_hash_params.py
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-26 12:55:25 +01:00
Chia-I Wu 7de85694fa ilo: define ILO_IMAGE_MAX_LEVEL_COUNT
Define ILO_IMAGE_MAX_LEVEL_COUNT for ilo_image and remove unnecessary header
includes.
2015-06-26 13:45:28 +08:00
Chia-I Wu cbdc26aa3f ilo: replace pipe_format by gen_surface_format
Replace pipe_format by gen_surface_format in ilo_image.  Change how depth
format is specified in ilo_state_zs.
2015-06-26 13:45:28 +08:00
Chia-I Wu 2ee95f6d64 ilo: always use the specified image format
Move silent promotion of PIPE_FORMAT_ETC1_RGB8 or combined depth/stencil out
of core.
2015-06-26 13:45:28 +08:00
Chia-I Wu dc2e92b2d3 ilo: replace pipe_texture_target by gen_surface_type
Replace pipe_texture_target by gen_surface_type in ilo_image.  Change how
GEN6_SURFTYPE_CUBE is specified in ilo_state_surface and ilo_state_zs.
2015-06-26 13:45:28 +08:00
Chia-I Wu 934e4a469f ilo: initialize ilo_image from ilo_image_info
Convert pipe_resource to ilo_image_info for image initialization.
2015-06-26 13:45:28 +08:00
Chia-I Wu f825fe8e13 ilo: remove ilo_image_disable_aux()
Fail resource creation when aux bo allocation fails.
2015-06-26 13:45:28 +08:00
Chia-I Wu 07acf9cb16 ilo: improve SURFTYPE_BUFFER validations
Reorganize the validations to make them more systematic.
2015-06-26 13:45:27 +08:00
Chia-I Wu 9871646c13 ilo: remove ilo_buffer
Since the addition of ilo_vma, it was used only to pad a bo for sampling
engine surfaces.  Replace it entirely with these functions

  ilo_state_surface_buffer_size()
  ilo_state_vertex_buffer_size()
  ilo_state_index_buffer_size()
  ilo_state_sol_buffer_size()
2015-06-26 13:45:27 +08:00
Chia-I Wu 36d107e92c ilo: introduce ilo_vma
This cleans up the code a bit and makes ilo_state_vector_resource_renamed()
simpler and more robust.  It also allows a single bo to back mulitple VMAs.
2015-06-26 13:45:27 +08:00
Iago Toral Quiroga fbba25bba0 mesa: remove unnecessary checks in _mesa_readpixels_needs_slow_path
readpixels_can_use_memcpy will later call _mesa_format_matches_format_and_type
which does much tighter checks than these to decide if we can use
memcpy for readpixels.

Also, the checks do not seem to be extensive enough anyway, since we are
checking for signed/unsigned conversion only when the framebuffer has integers,
but the same checks could be done for other types anyway, since as long as
there is a signed/unsigned conversion we can't memcpy.

No regressions observed on i965/llvmpipe.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-06-26 07:42:47 +02:00
Jason Ekstrand 316206ee9e i965/vec4_live_variables: Do liveness analysis bottom-to-top
From Muchnick's Advanced Compiler Design and Implementation:

"To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis"

Previously, we were walking the blocks forwards and updating the livein and
then the liveout.  However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks.  The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors.  This works out to an
O(n^2) computation where n is the number of blocks.  If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.

In b2c6ba0c4b, we made this same change in
the FS backend to great effect.  Might as well keep it consistent and make
the same change for vec4.  Also, this took the time to run the test:

ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1

from 6:49.62 to 3:31.40 on Timothy Arceri's machine.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-25 16:42:20 -07:00
Ben Widawsky c1151b18f2 i965/skl: Use more compact hiz dimensions
gen8 had some special restrictions which don't seem to carry over to gen9.
Quoting the spec for SKL:
"The Z_Height and Z_Width values must equal those present in
3DSTATE_DEPTH_BUFFER incremented by one."

This fixes nothing in piglit (and regresses nothing).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-06-25 14:17:02 -07:00
Marek Olšák 101a73846b radeonsi: don't fail in si_shader_io_get_unique_index
Trivial. Picked from my tessellation branch.
2015-06-25 15:05:56 +02:00
Kenneth Graunke c97105ee12 i965: Drop brw->depthstencil.stencil_offset from gen8_depth_state.c.
This is always 0 - only brw_workaround_depthstencil_alignment ever sets
it, and that doesn't run on Gen6+.  My initial Broadwell depth state
commit had this mistake.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-06-25 02:18:51 -07:00
Kenneth Graunke 6026f7e8fb nir: Recognize max(min(a, 1.0), 0.0) as fsat(a).
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).

shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs:     11928 -> 11670 (-2.16%)
helped:                                64
HURT:                                  0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-06-25 02:12:32 -07:00
Marek Olšák 77a78c65f8 softpipe,llvmpipe: fix PIPE_SHADER_CAP_MAX_INPUTS value
PIPE_MAX_SHADER_INPUTS was recently bumped to 80 because of tessellation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91099
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91101

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-06-25 09:00:23 +02:00
Ben Widawsky d1663ccb4c i965/bxt: Add basic Broxton infrastructure
The thread counts and URB information are all speculative numbers that were
based on some CHV numbers at the time.

v2:
Originally this patch had PCI IDs. I've moved that to a new patch at the end of
the series.
Remove is_cherryview hack.
Add PCI ids. These match the ones defined in the kernel. The only one tested by
us is 0x0a84.
Capitalize the hex string (Mark)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: "Lecluse, Philippe" <Philippe.Lecluse@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-06-24 16:37:12 -07:00
Ian Romanick 9f261dc18d radeon: Advertise correct GL_QUERY_COUNTER_BITS/GL_SAMPLES_PASSED value
Commit b765119c changed the default value of all the counter bits to
64.  However, older hardware only has 32 counter bits.

This has only been build-tested.  We don't have any tests that verify
the advertised value against implementation behavior, so I don't know
what additional testing could be done.

NOTE: It appears that many Gallium drivers (at least r300 and i915g)
have the same problem, but I don't see a way for the state-tracker to
determine the counter size.  Marek says, "For Gallium, a new PIPE_CAP or
new get_xxx_param function will be needed."

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
2015-06-24 16:33:32 -07:00
Jason Ekstrand b2c6ba0c4b i965/fs_live_variables: Do liveness analysis bottom-to-top
From Muchnick's Advanced Compiler Design and Implementation:

"To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis"

Previously, we were walking the blocks forwards and updating the livein and
then the liveout.  However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks.  The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors.  This works out to an
O(n^2) computation where n is the number of blocks.  If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.

On my HSW desktop, one particular shadertoy test gets a 20% improvement in
compile times:

N           Min           Max        Median           Avg        Stddev
x  10        15.965        16.884        16.026       16.1822    0.34736846
+  10        12.813        13.052        12.876       12.8891    0.06913666
Difference at 95.0% confidence
        -3.2931 +/- 0.235316
        -20.3501% +/- 1.45417%
        (Student's t, pooled s = 0.250444)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-24 13:11:30 -07:00
Tapani Pälli 104c8fc2c2 i965: Delete linked GLSL IR when using NIR.
This is based on Kenneth's patch to delete 'most of the IR'. Due to
linker changes to clone variables, we can now free all of IR.

Saves 58MB of memory when replaying a Dota 2 trace on Broadwell.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-06-24 12:03:41 -07:00
Tapani Pälli c2ff3485b3 glsl: clone inputs and outputs during linking
This increases memory pressure during linking but makes it easier
for backend to free IR after it is not needed anymore.

v2: use resource list as ralloc context in case of relink (Kenneth)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-06-24 12:01:21 -07:00
Chris Wilson 4b35ab9bdb i965: Rename intel_emit* to reflect their new location in brw_pipe_control
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-06-24 10:35:04 -07:00
Chris Wilson 9d4b9f1e0c i965: Transplant PIPE_CONTROL routines to brw_pipe_control
Start trimming the fat from intel_batchbuffer.c. First by moving the set
of routines for emitting PIPE_CONTROLS (along with the lore concerning
hardware workarounds) to a separate brw_pipe_control.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-06-24 10:35:04 -07:00
Kenneth Graunke 147cdb53ec nir: Use a switch statement for detecting move-like operations.
Suggested by Jason Ekstrand.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-06-24 10:35:04 -07:00
Brian Paul e31bce4041 svga: silence warnings about unexpected shader type
Trivial.
2015-06-24 10:42:19 -06:00
Brian Paul c1de7df6d4 st/mesa: remove unneeded pipe_surface_release() in st_render_texture()
This caused us to always free the pipe_surface for the renderbuffer.
The subsequent call to st_update_renderbuffer_surface() would typically
just recreate it.  Remove the call to pipe_surface_release() and let
st_update_renderbuffer_surface() take care of freeing the old surface
if it needs to be replaced (because of change to mipmap level, etc).

This can save quite a few calls to pipe_context::create_surface() and
surface_destroy().

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-24 07:14:56 -06:00
Emil Velikov a552c897ca st/wgl: add stw_nopfuncs.h to the sources lists
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-24 13:43:44 +01:00
Julien Isorce 30d67d3824 loader: move loader_open_device out of HAVE_LIBUDEV block
Fixes the following build issue, when building without libudev.

CCLD   libGL.la
./.libs/libglx.a(dri2_glx.o): In function `dri2CreateScreen':
src/glx/dri2_glx.c:1186: undefined reference to `loader_open_device'
collect2: ld returned 1 exit status

CCLD     libEGL.la
Undefined symbols for architecture x86_64:
"_loader_open_device", referenced from:
  _dri2_initialize_x11_dri2 in libegl_dri2.a(platform_x11.o)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91077
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-24 13:43:44 +01:00
Grigori Goronzy 390f94e358 winsys/radeon: reduce BO cache timeout
1000 ms is an extreme value for typical interactive loads. A large
cache has some disadvantages. Search for reusable BOs can take a long
time and memory might get exhausted.

Let's be rather conservative and use half of the old value,
500ms. This is beneficial to some loads on my test system and there
are no regressions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-06-24 14:33:40 +02:00
Grigori Goronzy 29aaab2b5f winsys/radeon: align BO size to page size
This is the basic granularity for BO allocations. The alignment also
helps with BO reuse by the cached bufmgr.

This results in a huge 45% speedup in Metro 2033 Redux on my test
system. The game relies on buffer orphaning with very small buffers
(hundreds of bytes in size) and that did not work efficiently
before. This change may also affect other applications and games.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-06-24 14:33:14 +02:00
Tapani Pälli 32a220f1f6 glsl: remove cross validation of interpolation qualifier with GLSL 4.40
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-06-24 10:06:32 +03:00
Kenneth Graunke 23132cd13b i965: Fix whitespace error in gen8_depth_state.c
Trivial.
2015-06-23 23:31:17 -07:00
Kenneth Graunke c8b8e8b29b i965: Don't count NIR instructions for shader-db.
Matt, Jason, and I haven't found this useful in a long time.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-23 23:31:17 -07:00
Michel Dänzer 7796e8889a winsys/radeon: Unmap GPU VM address range when destroying BO
But only when doing so is safe according to the
RADEON_INFO_VA_UNMAP_WORKING kernel query.

This avoids kernel GPU VM address range conflicts when the BO has other
references than the GEM handle being closed, e.g. when the BO is shared.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90537
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90873

Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-06-24 15:11:55 +09:00
Eric Anholt 3fd4c80b32 vc4: Also dump VC4_PACKET_LOAD_TILE_BUFFER_GENERAL. 2015-06-23 18:40:50 -07:00
Eric Anholt 5458ac01ae vc4: Add dumping for VC4_PACKET_LOAD/STORE_FULL_RES_TILE_BUFFER. 2015-06-23 18:40:50 -07:00
Eric Anholt 997f677841 vc4: Don't try to CSE color reads.
It returns a new value for each sample in the TLB.  We've already avoided
trying to get the same index's color multiple times at the vc4_program.c
level, so we're not losing anything by doing this.
2015-06-23 18:40:50 -07:00
Eric Anholt 0f69d59b1c vc4: Make a helper for TLB color writes, too.
We've done so for all the other QIR instruction generation in this file.
2015-06-23 18:40:50 -07:00
Eric Anholt af83eb2581 vc4: Pull the blending operation out to a separate function.
It's fairly separate from the rest of the TLB operations at frag end time,
and we'll need to run it multiple times to support MSAA blending.
2015-06-23 18:40:50 -07:00
Eric Anholt 76851f49a5 vc4: Clarify size calculation for Z/S writes.
It's the same value for loads and stores, because they're basically the
same packet.
2015-06-23 18:40:50 -07:00
Eric Anholt 8fbcabc41a vc4: Add an "args" temporary for RCL setup. 2015-06-23 18:40:50 -07:00
Eric Anholt 19056d0429 vc4: Reuse (and extend) the packet.h sizes for dumping. 2015-06-23 18:40:50 -07:00
Eric Anholt fc0da629b5 vc4: Fix printfs for blit fallbacks. 2015-06-23 18:40:50 -07:00