Commit Graph

22438 Commits

Author SHA1 Message Date
Marek Olšák bc09c3d59e ac: add radeon_info::num_good_cu_per_sh
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák 662db03577 radeonsi: fix printing a BO list into ddebug reports
important for debugging

Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák da72b6296c r600: fix HTILE for NPOT textures with mipmapping
Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák a1b9a00f82 radeonsi: fix HTILE for NPOT textures with mipmapping on SI/CI
VI uses addrlib so it's unaffected.

Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Brian Paul 7baf45dfc7 svga: assorted fixes/changes in svga_pipe_blit.c
To align the code with VMware's in-house copy.

Signed-off-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 25fceccf72 svga: set buffer bind_flags in svga_buffer_add_host_surface()
To match the in-house VMware code.

Signed-off-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 337a74aa40 svga: add format conversion for legacy formats
This patch extends the format_conversion table to support
different view formats on texture buffer.
For legacy image formats such as INTENSITY, LUMINANCE, LUMINANCE_ALPHA,
special swizzle masks will be used on the red or RG channels.

This fixes piglit test arb_texture_buffer_object-formats fs|vs arb

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 389450a271 svga: remove obsolete code to reemit gs binding
The svga_reemit_gs_bindings function is no longer needed. Remove it.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul c174ee9f9d svga: move variant->fs_shadow_compare_units assignment
Fixes a crash since the variant object isn't allocated until later
in the function.  Not sure how this got through.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee cb70474b20 svga: fix resource checking in is_blending_enabled()
This patch makes sure a valid color buffer is bound before
checking its resource. This fixes Unigine Valley running in SM41 device.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Neha Bhende c6103328ab svga: Use texture_copy_region instead of texture_copy_handle for multisampling
This fixes some of tests cases in arb_copy_image-formats and also fixes
SurfaceCopy related errors in vmware.log when multi sampled surfaces are
used.

Tested with piglit, glretrace on windows and linux VM.

v2: As per Brian's comment

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee fdf5885183 svga: add missing devcap check for texture array support
The patch checks DXFMT_ARRAY devcap for texture array support.

Tested with MTT-piglit. No regressions.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 3069581260 svga: no need to check MULTISAMPLE devcap for view format
According to the current SVGA contract, any view format can be
used on the underlying resource that is multisample. So there
is no need to check the MULTISAMPLE devcap for the view format.

Fixes black rendering issue with Tropics running with 4xMSAA.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 6f254ad9b4 svga: sync devcap name changes in svga3d_devcaps.h
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 49428c8d61 svga: explicit set DXFMT_SHADER_SAMPLE for DS format for pre-SM41 device
Explicit set the DXFMT_SHADER_SAMPLE bit for depth stencil formats
for pre-SM41 device only. This bit is now set by the SM41 device.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 379a2f265f svga: remove unused variable
Trivial.
2018-09-10 13:07:30 -06:00
Brian Paul cbcc416a58 svga: draw round points when msaa is enabled
See comments for details.  This allows the piglit
ext_framebuffer_multisample-point-smooth test to pass.

Also, test the pipe_rasterizer_state::point_quad_rasterization field
to see if sprite point rasterization is needed because it's possible
for no sprite_coord_enable bits to be set when drawing sprites.

Finally, remove old, stale comments.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 6b039c7d7c svga: check number of samples before emitting MSAA decls/opcodes
If real MSAA is not available, we only support 1 sample/pixel.  In that
case, we must not declare MSAA resources or emit MSAA opcodes.  Do that
by checking the sample count.

Fixes several piglit MSAA tests, such as
arb_texture_multisample-sample-depth (when the hard-coded sample count
of 4 is fixed in that test).

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul cf2fb6813c svga: remove obsolete comment on format_cap_table[]
We removed the special cases referred to in this comment in the commit
"svga: add a separate function to get dx format capabilities from
vgpu10 device".

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 0fc6c17bf2 svga: allow TGSI_TEXTURE_CUBE_ARRAY in emit_tg4()
Technically, SM4.1 doesn't support cube map arrays, but our backend
renderers actually do.  This allows the Piglit textureGather cube
map array tests to pass.

Tested with GLrenderer, DX11renderer and SWrenderer.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 3467a274e0 svga: no dma on multisample surface
Force direct map on multisample surface.

Fixes SVGA Driver Errors running multisample piglit tests on Linux VM

v2: use texture for the check.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 5f14444184 svga: src surface for IntraSurfaceCopy cannot be multisample
Fixes SVGA Driver Errors with piglit test arb_copy_image-targets

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 026e1ad7bb svga: fix missing format multisample devcap check
In commit e4048f6cd1, svga_is_dx_format_supported() is supposed to
also check the SVGA3D_DXFMT_MULTISAMPLE bit for multisample
support of a format. Somehow that code is not included in that commit.
This patch fixes it.

Fixes piglit test spec@ext_framebuffer_multisample@formats all_samples.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 285d8b47b1 svga: fix incorrect multisample support in VGPU9 device
Commit e4048f6cd1 unintentionally allows multisample support for VGPU9 device.
This patch fixes this regression.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 59a56ca1c8 svga: fix the missing devcap for SVGA3D_BC3_UNORM_SRGB
Set the devcap to SVGA3D_DEVCAP_DXFMT_BC3_UNORM_SRGB

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 16666eb470 svga: add a separate function to get dx format capabilities from vgpu10 device
Currently we have one function to get format capabailities and
we convert DX10 devcaps back to DX9. This can be confusing.
Going forward we will have a separate function for dealing with dx formats.

This patch also fixes the depth stencil devcap. Instead of hardcoding
the capabilities for the depth stencil formats, we will inquire the
device for the capabilities. Note: we will still need to explicity set
the SVGA3D_DXFMT_SHADER_SAMPLE bit for SVGA3D_R32_FLOAT_X8X24 and
SVGA3D_R24_UNORM_X8 since this bit is not advertised but supported
by the device.

v2: reapply the patch after svga_is_format_supported is moved to svga_format.c

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee b1aee7ff05 svga: assign a separate function for is_format_supported() for vgpu10 device
This patch adds a new function svga_is_dx_format_supported() to check
for format support in a VGPU10 device.

v2: reapply the patch after svga_is_format_supported is moved to svga_format.c

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 1ea9c80d6d svga: add some devcap debugging code
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 96ef81e39e svga: fix depth and coverage mask output declaration
Set the component mask to zero for both registers.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 7187a2f7ff svga: add sample positions for 2 samples
Fixes piglit tests spec@arb_sample_shading@builtin-gl-sample-position 2
                   spec@arb_texture_multisample@fb-completeness@2

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 73c850fb9a svga: check sample count devcaps
Check sample count devcaps from the svga device to determine the
supported sample counts.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul afacde3553 svga: fix 1-element cube map array issue
As with 1D and 2D array textures, if there's only one array element
(one cubemap in this case) we have to issue different shader code.

This fixes a number of Piglit cubemap array tests.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 767c1eb436 svga: simplify array test in svga_init_shader_key_common()
And squash commit a patch to silence a compiler warning (add
default case to the switch statement).

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee b343c6915c svga: sync svga3d_types.h with upstream changes
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 7448bb0089 svga: add git version logging at init time
Before we can log the git version in the host log,
we'll add the git version in the init debug message.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 4669ffd29b svga: fix a typo in svga_texture_copy_region()
Trivial.
2018-09-10 13:07:30 -06:00
Charmaine Lee 3233d05390 svga: use helper function to do copy region
Use the common helper function svga_texture_copy_region
for copy region command.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 74791b80b9 svga: fix cubemap array rendering with backed surface view
This patch fixes the layer index when rendering to a
backed surface view of a cubemap array.

Fixes piglit test fbo-generatemipmap-cubemap array.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 2d39e6d0c8 svga: add a helper function to send ResolveCopy command
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 9a24b08a49 svga: sync svga3d header files
This is a squash of what was orginally three commits.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee f3eda3e5e1 svga: add SM4_1 enable debug print
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee ccd895db76 svga: fix swizzling for texture gather
Texture swizzling for texture gather needs to be done to the selected texels
rather than to the returned vector. This patch has specical cases
for the different swizzles in emit_tg4().

Fixes a lot of piglit texture gather tests.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee be1993d6ed svga: fix starting index for system values
Currently, the starting index for system values is assigned to
the next index after the highest index of the tgsi declared input registers.
But the tgsi index might be different from the actual assigned index, hence
this might cause overlap of indices.
With this patch, the shader linker keeps track of the highest index of the
translated input registers, and the next index will be used for the
starting index for system values.

Fixes SHIM errors running arb_copy_image-formats on SM4_1 device.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Deepak Rawat 569f838987 winsys/svga: Add support for new surface ioctl, multisample pattern
Kernel driver version 2.15 added new surface ioctl named:
DRM_VMW_GB_SURFACE_CREATE_EXT
DRM_VMW_GB_SURFACE_REF_EXT

The new ioctl has support for 64-bit svga3d_flags if
DRM_VMW_PARAM_SM4_1 is available.

Multisampling surface mob size calculation is added. Also synced the
relevant header update.

svga device modified the surface define command V3 with new parameter
multisampling pattern. Adding support for that in winsys.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 3f55425ee6 svga: enable MSAA for SM4_1 device
The SVGA device is deprecating the DX9 MSAA support.
This patch enables MSAA for SM4_1 device by explicitly
setting the SVGA3D_SURFACE_MULTISAMPLE bit.
For SM4_1 device, only 4 samples is supported.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 8088cb6f53 svga: add sample count to the surface_can_create interface
With this patch, sample count is also taken into account
when determining if a resource can be created.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 4a1976bfcf svga: implement support for GL_ARB_texture_query_lod
Just translate the TGSI LODQ intruction to VGPU10 LOD instruction.
All (4) Piglit GL_ARB_texture_query_lod tests pass.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-09-10 13:07:30 -06:00
Neha Bhende 252e97ecdf svga: Add support for arb_texture_gather
With sm4_1, we can support single channel 2D or CubeMap textures.
This patch exercises this feature.

Tested with piglit

v2: As per Brian's comment

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 36c84bcd77 svga: add support for interpolation at sample position
Vs. sampling at the centroid or the fragment center.

Note that this does not fix failures with the Piglit
arb_sample_shading-interpolate-at-sample-position or
arb_sample_shading-ignore-centroid-qualifier.exe tests at this time.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul bcf7aaa9f7 svga: clarify sys value -> input register mapping
We translate TGSI system value registers to VGPU10 input registers.
Add a comment and set file = TGSI_FILE_INPUT.  That's not stricly
necessary since we map both TGSI_FILE_INPUT and TGSI_FILE_SYSTEM_VALUE
to VGPU10_OPERAND_TYPE_INPUT, but this makes the code a bit more
understandable.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 9de5bdb341 svga: add support for FS sample mask output
This, with the previous work for sample position/id query, allows
us to enable per-sample shading for VGPU 10.1.

Note that quite a few Piglit arb_sample_shading tests still do not
pass, but many do.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 0a219dd918 svga: add support for sample id, sample position
Sample ID is just a system value.  Sample position must be implemented
with the VGPU10_OPCODE_SAMPLE_POS instruction.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul ac4a0c0e82 svga: implement no-op svga_set_min_samples()
This is part of the per-sample shading feature (PIPE_CAP_SAMPLE_SHADING).

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 3c3fc7154e svga: add support for independent blend function per render target
This patch adds support for GL_ARB_draw_buffers_blend extension
for SM4_1 device.

Fixes piglit test fbo-draw-buffers-blend.

This patch is squashed with a subsequent patch which fixed a
regression.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 5512f943b8 svga: emit shader version as 4.0 or 4.1 depending on device support
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 1d806b6f13 svga: restructure nested if's in emit_src_register()
To make it cleaner for subsequent changes.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 16439085f5 svga: sync VGPU10ShaderTokens.h with upstream changes
This includes new DX 10.1 opcodes and tokens.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 22e8099711 svga: add support for shadow cubemap array
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee f929247d24 svga: add support for rendering to cubemap array
Fixes piglit test arb_texture_cube_map_array-fbo-cubemap-array

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 1df17fc697 svga: add support for TXL2 opcode
This patch adds support for cubemap array texture lookup with
explicit LOD.

Fixes piglit test arb_texture_cube_map_array-cubemap-lod

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Charmaine Lee 62402be407 svga: add support for cubemap array
This patch adds support for cubemap array for SM4_1.

Fixes piglit test arb_texture_cube_map_array-cubemap

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Brian Paul 018ff0112f svga: add have_sm4_1 flag, helper function
Signed-off-by: Brian Paul <brianp@vmware.com>
2018-09-10 13:07:30 -06:00
Erik Faye-Lund c4017106bb virgl: do not map zero-sized resource
When creating textures, we avoid creating backing-store for all
multisampled textures, not just depth buffers.

So we can't try to map them later. That's just going to fail. So
let's take the blit-based code-path that seems to avoid this problem.

This make this piglit test-case no longer crash (although it still
fails):

bin/copyteximage 2D -samples=2 -auto

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-09-10 10:35:42 +02:00
Erik Faye-Lund 8083464013 virgl: remove dead code
We don't use the size we calculate in this function, so let's just
drop the calculation

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-09-10 10:35:32 +02:00
Erik Faye-Lund b9c40e492d virgl: drop needless return-code
We always return TRUE, and we never check the return-value. Let's
just drop the return value instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-09-10 10:35:20 +02:00
Erik Faye-Lund 9635869d73 virgl: free trans on map-error
When we fail to map memory, we should also free trans to avoid
leaking memory.

Noticed while reading code.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-09-10 10:35:02 +02:00
Mathias Fröhlich 2fece204c0 etnaviv: Reduce max offset to available hardware bits.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-09-10 07:59:31 +02:00
Dave Airlie 240af61494 virgl: don't send a shader create with no data. (v2)
This fixes the situation where we'd send a shader with just the
header and no data.

piglit/glsl-max-varyings test was causing this to happen, and
the renderer fix was breaking it.

v2: drop fprintf

Fixes: a8987b88ff "virgl: add driver for virtio-gpu 3D (v2)"
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2018-09-10 12:23:30 +10:00
Marek Olšák 9ce2cef68f gallium: add PIPE_CAP_MAX_TEXTURE_UPLOAD_MEMORY_BUDGET 2018-09-07 17:59:02 -04:00
Marek Olšák 25ffb84016 radeonsi: pin the winsys thread to the requested L3 cache (v2)
v2: rebase

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-07 16:03:36 -04:00
Eric Anholt a91b158bd9 v3d: Fix setup of the VCM cache size.
There were two bugs working together to make things mostly work: I wasn't
dividing the VPM output size available by the size of a batch (vertex),
but I also had the size of the VPM reduced by a factor of 8.

Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it
seems also my intermittent varying failures.

Fixes: 1561e4984e ("v3d: Emit the VCM_CACHE_SIZE packet.")
2018-09-07 08:11:38 -07:00
Eric Anholt f73f748323 v3d: Fix SRC_ALPHA_SATURATE blending for RTs without alpha.
Fixes
dEQP-GLES3.functional.fragment_ops.blend.default_framebuffer.rgb_func_alpha_func.dst.src_alpha_saturate_src_alpha_saturate
and friends with --deqp-egl-config-name=rgb565d0s0

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-09-07 08:11:05 -07:00
Rob Clark 5404e0637f freedreno: fix rast->depth_cleap_near/far
Fixes: daa19363de gallium: split depth_clip into depth_clip_near & depth_clip_far
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-07 07:41:43 -04:00
Marek Olšák fda7683726 gallium: enable GL_AMD_depth_clamp_separate on r600, radeonsi 2018-09-06 21:53:00 -04:00
Marek Olšák daa19363de gallium: split depth_clip into depth_clip_near & depth_clip_far
for AMD_depth_clamp_separate.
2018-09-06 21:53:00 -04:00
Jason Ekstrand 44ec31cd75 nir: Drop the vs_inputs_dual_locations option
It was very inconsistently handled; the only things that made use of it
were glsl_to_nir, glspirv, and nir_gather_info.  In particular,
nir_lower_io completely ignored it so anyone using nir_lower_io on
64-bit vertex attributes was going to be in for a shock.  Also, as of
the previous commit, it's set by every driver that supports 64-bit
vertex attributes.  There's no longer any reason to have it be an option
so let's just delete it.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-09-06 16:07:50 -05:00
Jason Ekstrand 0909a57b63 radeonsi/nir: Set vs_inputs_dual_locations and let NIR do the remap
We were going out of our way to disable dual-location re-mapping in NIR
only to then do the remapping in st_glsl_to_nir.cpp.  Presumably, this
was so that double_inputs would be correct for the core state tracker.
However, now that we've it to gl_program::DualSlotInputs which is
unaffected by NIR lowering, we can let NIR lower things for us.  The one
tricky bit here is that we have to remap the inputs_read bitfield back
to the single-slot convention for the gallium state tracker to use.

Since radeonsi is the only NIR-capable gallium driver that also supports
GL_ARB_vertex_attrib_64bit, we only have to worry about radeonsi when
making core gallium state tracker changes.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-09-06 16:07:50 -05:00
Marek Olšák 1285f71d3e gallium: add PIPE_CAP_RASTERIZER_SUBPIXEL_BITS
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-09-06 16:07:40 -04:00
Hyunjun Ko 2454742a84 freedreno/ir3: insert mov if same instruction in the outputs.
For example,

    result0 = texture(sampler[indexBase + 5], coords);
    result1 = texture(sampler[indexBase + 0], coords);
    result2 = texture(sampler[indexBase + 0], coords);
    out_result0 = result0;
    out_result1 = result1;
    out_result2 = result2;

In this kind of case we need to insert an extra mov to the outputs
so that the result could be assigned to each register respectively.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Hyunjun Ko b4da2f6667 freedreno/ir3: make immediates array dynamic
Since most shaders wouldn't need that large array of immediates, making
the array dynamic could save unnecessary spaces.

In addition, sometimes we can potentially have a much larger array
of immediates to be lowered, which might be more than 64.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark c3d9f29b78 freedreno: allocate ctx's batch on demand
Don't fall over when app wants more than 32 contexts.  Instead allocate
contexts on demand.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark a122118c14 freedreno: add fd_context_batch() accessor
For cases in which (after the following commit) ctx->batch may be null.
Prep work for following commit.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark a45e1802db freedreno/a6xx: fix mem2gmem for zsbuf
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark c77e0948c7 freedreno/batch: fix crash in !reorder case
We aren't using the batch-cache if reorder==false.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark 2c623e7071 freedreno/ir3: better compile_error() printing
Try to show the error at the appropriate line of nir

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark ca758251ba freedreno/a6xx: bordercolor fixes
Port fixes from a5xx (f0715442)

TODO maybe this should move to shared code, since it seems to be the
same.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark 73378013d7 freedreno: fix context teardown harder
The border_color_uploaders need to be torn down before the transfer_pool
is destroyed.

Fixes: e11e9d6394 freedreno: fix context teardown race
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark 1a24f51966 freedreno/ir3: ignore unused inputs
We could end up w/ inputs larger than vec4, simply because unused inputs
are not split.

Fixes things like dEQP-GLES31.functional.separate_shader.random.77 (and
probably a handful of others)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Rob Clark 6b4397feab freedreno/a6xx: fix debug build crash
Porting 0c8d9e923a to a6xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-05 13:38:43 -04:00
Charmaine Lee af104ad799 svga: rename face to layer_face
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-05 11:22:42 -06:00
Brian Paul e334e104d0 svga: encode sample count in resource declarations
No regressions before the corresponding host-side change.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2018-09-05 11:22:42 -06:00
Charmaine Lee 49678e9e49 svga: sync with upstream changes to surface flags
SVGA device now supports 64 bits surface flags. This patch
updates the winsys interface to allow 64 bits surface flags.
The linux winsys layer will for now only honor the lower 32 bits of
the surface flags.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-05 11:22:42 -06:00
Neha Bhende 4310649ccb svga: avoid try_blit() for some depth formats on non vgpu10.
On non vgpu10, driver doesn't support util_blitter_blit for SVGA3D_Z_D16,
SVGA3D_Z_D24x8, SVGA3D_Z_D24S8. Patch fixes following piglit tests regression on hwv8 caused
by commit 27bf35caea5e:
spec@arb_depth_texture@fbo-depth-gl-depth-component16-blit
spec@arb_depth_texture@fbo-depth-gl-depth-component24-blit
spec@arb_depth_texture@fbo-depth-gl-depth-component32-blit

Tested with mtt-piglit on hw 8,9,10,11,13 and mtt-glretrace on windows and linux.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-05 11:22:42 -06:00
Neha Bhende 53091a0312 svga: convert dst format to linear when blending is enabled.
When blending is enabled, framebuffer colorspace has to be linear.
Previously, we never hit this case because we were not supporting sRGB
drawable. Previous patch added that support.

Tested with mtt glretrace, viewperf, piglit, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-09-05 11:22:42 -06:00
Neha Bhende 8449c33a27 svga: start using SVGA3dCmdIntraSurfaceCopy command for svga_blit.
Basically, SVGA3dCmdIntraSurfaceCopy command allow copying when
source and destination are same.

Tested with MTT piglit, glretrace, viewperf, conform

v2: changes as per Charmaine's comment
v3: changes as per Charmaine's comment

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-05 11:22:42 -06:00
Neha Bhende 4639ef3763 svga/winsys: Add cap2 support in winsys
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-05 11:22:42 -06:00
Neha Bhende 6b3627da08 svga: Add SVGA3dCmdIntraSurfaceCopy command support in OpenGL driver
v2: changes as per Charmaine's comment

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-09-05 11:22:42 -06:00
Brian Paul bac94dfefa svga: update device header files from upstream
This is a squash commit of several earlier patches.

Signed-off-by: Brian Paul <brianp@vmware.com>
2018-09-05 11:22:42 -06:00
Tomeu Vizoso f13de57edb virgl: use hw-atomics instead of in-ssbo ones
Emulating atomics on top of ssbos can lead to too small max SSBO count,
so let's use the hw-atomics mechanism to expose atomic buffers instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-09-05 05:46:58 +01:00
Erik Faye-Lund 1bd927d997 virgl: update minor differences to upstream header
virgl_protocol.h is considered to have it's upstream in the
virglrenderer repository, and somehow these minor differences has
crept in.

Let's sync with the upstream to avoid this.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-09-05 05:46:52 +01:00
Erik Faye-Lund 5a587d18d5 gallium: add PIPE_CAP_MAX_COMBINED_HW_ATOMIC_COUNTER{S,_BUFFERS}
This moves the evergreen-specific max-sizes out as a driver-cap, so
other drivers with less strict requirements also can use hw-atomics.

Remove ssbo_atomic as it's no longer needed.

We should now be able to use hw-atomics for some stages and not for
other, if needed.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-09-05 05:46:46 +01:00
Erik Faye-Lund d641d3f48b gallium: add PIPE_CAP_MAX_COMBINED_SHADER_BUFFERS
This gets rid of a r600 specific hack in the state-tracker, and prepares
for other drivers to be able to use hw-atomics.

While we're at it, clean up some indentation in the various drivers.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-09-05 05:46:37 +01:00
Eric Anholt 2e59b88903 freedreno: Drop a bunch of duplicated gallium PIPE_CAP default code.
Now that we have the util function for the default values, we can get rid
of the boilerplate.

v2: Rebase on new gallium caps

Reviewed-by: Rob Clark <robdclark@gmail.com> (v1)
2018-09-04 08:08:22 -07:00
Eric Anholt 492b74b445 v3d: Drop a bunch of duplicated gallium PIPE_CAP default code.
Now that we have the util function for the default values, we can get rid
of the boilerplate.

v2: Rebase on new gallium caps
2018-09-04 08:08:18 -07:00
Eric Anholt c311e00000 vc4: Drop a bunch of duplicated gallium PIPE_CAP default code.
Now that we have the util function for the default values, we can get rid
of the boilerplate.

v2: drop GLSL level in favor of defaults.
v3: Rebase on new gallium caps
2018-09-04 08:08:10 -07:00
Eric Anholt ad782a7020 gallium: Add a helper for implementing PIPE_CAP_* default values.
One of the pains of implementing a gallium driver is filling in a million
pipe caps you don't know about yet when you're just starting out.  One of
the pains of working on gallium is copy-and-pasting your new PIPE_CAP into
each driver.  We can fix both of these by having each driver call into the
default helper from their default case, so that both sides can ignore each
other until they need to.

v2: fix i915g build, revert swr change to avoid breaking scons build
    (https://travis-ci.org/anholt/mesa/jobs/419739857)
v3: Rebase on 3 new gallium caps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: George Kyriazis <george.kyriazis@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
2018-09-04 08:07:52 -07:00
Christian Gmeiner b05a8f4f41 tegra: make use loader_open_render_node(..) helper
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-08-31 21:46:32 +02:00
Christian Gmeiner d0b09e2dfe tegra: fix memory leak
Fixes: 1755f608f5 ("tegra: Initial support")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-08-31 21:45:16 +02:00
Dave Airlie c9f5448695 radeonsi: fix regression in indirect input swizzles.
This fixes:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3.shader_test
since I reworked the 64-bit swizzles.

Fixes: bb17ae49ee (gallivm: allow to pass two swizzles into fetches.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-31 06:08:24 +01:00
Dave Airlie 750b829daf radeonsi: fix tess/gs fetchs for new swizzle.
I have piglit results from my machine, but I must have messed up,
and not built mesa in between properly.

Fixes: bb17ae49ee (gallivm: allow to pass two swizzles into fetches.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-31 06:08:21 +01:00
Ilia Mirkin 3e04c67950 nv50: bump compat glsl level to same as core
Passes the compat piglits. I'm sure that there will be odd issues that
aren't caught by them, but at least it should basically work.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-08-29 20:51:40 -04:00
Ilia Mirkin a608e5cc9f nvc0: bump compat GLSL version to match core
This passes the handful of tests in piglit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-08-29 20:51:40 -04:00
Dave Airlie bb17ae49ee gallivm: allow to pass two swizzles into fetches.
This hijacks the top 16-bits of swizzle, to pass in the swizzle
for the second channel.

This fixes handling .yx swizzles of 64-bit values.

This should fixup radeonsi and llvmpipe.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107524
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-30 00:15:40 +01:00
Timothy Arceri 5566dd8a61 radeonsi: add radeonsi_zerovram driconfig option
More and more games seem to require this so lets make it a config
option.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-30 07:57:38 +10:00
Timothy Arceri 406c3d748d radeonsi: enable GL 4.5 in compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-08-30 07:57:38 +10:00
Marek Olšák 93b8b987d0 radeonsi: add a thorough clear/copy_buffer benchmark 2018-08-29 15:31:42 -04:00
Marek Olšák 5914f5bd4a radeonsi: let internal compute dispatches tune WAVES_PER_SH 2018-08-29 15:31:42 -04:00
Marek Olšák c5442c1165 radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI 2018-08-29 15:31:42 -04:00
Marek Olšák d7250e4304 radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA_SI for measuring DMA on SI
DMA on SI doesn't support the timestamp packet, so it's emulated.
2018-08-29 15:31:42 -04:00
Marek Olšák c359880d8b radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA for measuring SDMA performance 2018-08-29 15:31:42 -04:00
Marek Olšák 0c5429cc73 radeonsi: add flag L2_STREAM for minimal cache usage 2018-08-29 15:31:41 -04:00
Marek Olšák 8f6e06d160 gallium: add TGSI_MEMORY_STREAM_CACHE_POLICY
For internal radeonsi shaders.
2018-08-29 15:31:41 -04:00
Brian Paul 18e9b4791b svga: add missing switch cases for shadow textures
This doesn't seem to make any difference in testing, but it fixes a
failed assertion when dumping sm3 shaders.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-08-29 11:29:07 -06:00
Brian Paul fb7e462c97 svga: fix vgpu9 sprite coordinate bug
Setting GL_POINT_SPRITE_COORD_ORIGIN to GL_LOWER_LEFT did not work for
vgpu9.  We can use the rasterizer sprite_coord_enable bitfield as-is.
We need to index into it using the TGSI semantic index, not the
register index.

This fixes the Piglit fbo-gl_pointcoord and glsl-fs-pointcoord tests.

Testing done: Piglit, Mesa sprite demos

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-08-29 11:29:07 -06:00
Brian Paul 8331d69a87 svga: fix PIPE_TEXTURE_RECT/BUFFER const buffer issue
The flag_rect and flag_buffer fields didn't sufficiently capture
the state changes needed for those resource types.  For example,
if a texture binding was changed from a 500x500 rect texture to a
400x400 rect texture we didn't set SVGA_NEW_TEXTURE_CONSTS.  But
we need to do that to emit the new texcoord scale factors to the
constant buffers.  Rather than track the sizes of all bound
resources, just set the flag if the resource is a rect.  Same
story with texture buffers.

Also, since rect/buffer textures are usable with VS/GS shaders,
add SVGA_NEW_TEXTURE_CONSTS to the flags we check for emitting
VS/GS constants.

This seems to help with XFCE / xfwm4 desktop scaling.
VMware issue 2156696.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-08-29 11:29:07 -06:00
Brian Paul 46c7433da8 svga: minor improvements in svga_state_constants.c
Add const qualifiers.  Add 'f' suffix on floats to avoid double
promotion.

Remove unneeded shader type assertion since the switch statement
handled it already.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-08-29 11:29:07 -06:00
Rhys Kidd f7d0c112cb nv50/ir: silence partitionLoadStore() unused function warning
Move this now-unused function into the existing comment block, which was its only prior use.

../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning:
      unused function 'partitionLoadStore' [-Wunused-function]
partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask)

Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-08-29 08:59:27 -04:00
Erik Faye-Lund a4e60ccb56 virgl: add debug-switch to output TGSI
This is quite useful for debugging shader-transpiling issues in
virglrenderer.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-08-28 14:13:43 +02:00
Erik Faye-Lund 4ab06cc56e virgl: introduce $VIRGL_DEBUG=verbose
This adds an environment-varaible that can be used for driver-specific
flags, as well as a flag for it to enable verbose output.

While we're at it, quiet some overly chatty debug-output by default.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-08-28 14:13:43 +02:00
Erik Faye-Lund 1b2444dffc virgl: replace fprintf-call with debug_printf
This is the only direct call-site for fprintf in virgl; all other
call-sites call debug_printf instead. So let's follow in style here.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-08-28 14:13:43 +02:00
Erik Faye-Lund 2ebfa90abe virgl: delete commented out fprintf-call
This is just debug-cruft left over. Let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-08-28 14:13:43 +02:00
Rhys Perry e56e600bd3 nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+
Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.

total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs    : 669901 -> 669373 (-0.08%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21064 (-0.02%)

                local     shared        gpr       inst      bytes
    helped           1           0         152         274         274
      hurt           0           0           0           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 14:23:42 +01:00
Rhys Perry d27c791891 nv50/ir: optimize multiplication by 16-bit immediates into two xmads
Rather than the usual three that would be created.

total instructions in shared programs : 5796385 -> 5786560 (-0.17%)
total gprs used in shared programs    : 670103 -> 669968 (-0.02%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21068 (-0.45%)

                local     shared        gpr       inst      bytes
    helped           1           0          64        1040        1040
      hurt           0           0          27           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:57:11 +01:00
Rhys Perry 400a4eb964 nv50/ir: optimize near power-of-twos into shladd
total instructions in shared programs : 5819319 -> 5796385 (-0.39%)
total gprs used in shared programs    : 670571 -> 670103 (-0.07%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21164 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0         318        1758        1758
      hurt           0           0          63           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:57:01 +01:00
Rhys Perry 2f52925f5c nv50/ir: move a * b -> a << log2(b) code into createMul()
With this commit, OP_MAD is handled on nv50 too. This commit is also
useful for later commits.

Also, instead of creating a shladd, it relies on LateAlgebraicOpt to
create one. This simplifies the code and helps shader-db slightly overall.

total instructions in shared programs : 5820882 -> 5819319 (-0.03%)
total gprs used in shared programs    : 670595 -> 670571 (-0.00%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21164 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0          18         230         230
      hurt           0           0           8         263         263

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:56:47 +01:00
Rhys Perry b60bc7a4ab nv50/ir: optimize imul/imad to xmads
This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.

total instructions in shared programs : 5768871 -> 5820882 (0.90%)
total gprs used in shared programs    : 669919 -> 670595 (0.10%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21164 (0.46%)

                local     shared        gpr       inst      bytes
    helped           0           0          38           0           0
      hurt           1           0         365        3076        3076

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:56:44 +01:00
Rhys Perry bcbcdf8448 gm107/ir: add support for OP_XMAD on GM107+
v4: make the immediate field 16 bits
v5: don't ever emit h1 flags for immediates

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:56:41 +01:00
Rhys Perry 5d6952d2de nv50/ir: add preliminary support for OP_XMAD
v4: remove uint16_t(...)
v4: don't allow immediates outside [0,65535] in insnCanLoad()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27 13:56:36 +01:00
Kenneth Graunke 1281608849 gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.
Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not
PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER.

Drivers for such hardware would like to advertise support for
ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp.

This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit,
changes the extension enable to be based on that, and enables it
in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP
(so they continue supporting this mode).
2018-08-24 17:25:36 -07:00
Emil Velikov cff80b6c15 Revert "configure: allow building with python3"
This reverts commit ae7898dfdb.

Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.

Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)

https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
2018-08-24 11:14:15 +01:00
Marek Olšák 9176703788 radeonsi: increase the maximum UBO size to 2 GB
Same as the closed driver.

This causes a failure in GL45-CTS.compute_shader.max, which has a trivial
bug.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-08-23 16:56:17 -04:00
Marek Olšák 5693ca865d radeonsi: bump MAX_GS_INVOCATIONS
same as the closed driver

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-08-23 16:56:17 -04:00
Marek Olšák d3c1b212bc gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZE
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-08-23 16:56:17 -04:00
Marek Olšák f6ccd594e7 gallium: add PIPE_CAP_MAX_GS_INVOCATIONS
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-08-23 16:56:17 -04:00
Emil Velikov ae7898dfdb configure: allow building with python3
Pretty much all of the scripts are python2+3 compatible.
Check and allow using python3, while adjusting the PYTHON2 refs.

Note:
 - python3.4 is used as it's the earliest supported version
 - python3 chosen prior to python2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-23 17:00:13 +01:00
Grazvydas Ignotas 2edf47edf0 llvmpipe: add cc clobber to inline asm
The bsr instruction modifies flags, so that needs to be indicated to the
compiler. No effect on generated code, but still needed for correctness.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-08-23 00:34:32 +03:00
Marek Olšák e80e8d7adc ac: fix WAITCNT flags for GFX9
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-22 14:34:43 -04:00
Marek Olšák d87fe1f0fd ac,radeonsi: use ac_build_gather_values more
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00
Marek Olšák 60beac9efc ac,radeonsi: use ac_build_fmad
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00
Marek Olšák c401ead68a radeonsi: use ac_build_imad
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00