Commit Graph

17862 Commits

Author SHA1 Message Date
Marek Olšák 7e76f9a7a8 radeonsi: record information about all written and read varyings
It's just tgsi_shader_info with DEFAULT_VAL varyings removed.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák c7f3e5c647 radeonsi: make si_shader_io_get_unique_index stricter
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák ed3190b3f3 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled
This is the first user of optimized monolithic shader variants.

Cull distances can't be disabled by states.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák d984a324bf radeonsi: add infrastr. for compiling optimized shader variants asynchronously
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák d2a56985d7 radeonsi: don't set vs.epilog.export_prim_id if TES is bound
there is no VS epilog in this case

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák fee71fec25 radeonsi: simplify checking for monolithic compilation
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák e6aee45db4 radeonsi: print all flags in si_dump_shader_key
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 6d5c2a8b5c radeonsi: split the shader key into 3 logical parts
key->part.*: prolog and epilog flags only
key->as_{ls,es}: special flags
key->mono.*: flags for monolithic compilation only

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák d4e9f409e9 radeonsi: fix culling if clip & cull distances are used at the same time
Fixed piglits:
- arb_cull_distance/clip-cull-3
- arb_cull_distance/clip-cull-4

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 9d8db805ef radeonsi: clean up si_emit_clip_regs
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák e59389d738 radeonsi: assume that a VS without POSITION is LS
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák bdd860e307 radeonsi: decrease the number of texture slots to 24
Company Of Heroes 2 needs only 24.

This saves 512 bytes of CE RAM per shader stage.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák fa476e0566 radeonsi: fast exit si_emit_derived_tess_state early
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 72d1669ed2 radeonsi: check for !is_linear in do_hardware_msaa_resolve
We don't want opt4Space here.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák 49fa4a4e60 gallium/radeon: add RADEON_SURF_OPTIMIZE_FOR_SPACE
FORCE_TILING should disable it. It has no effect now, but that may change
soon.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Mun Gwan-gyeong 44a3f2ee09 radeonsi: Add missing error-checking to si_create_compute_state (v2)
When the uploading of shader fails on si_shader_binary_upload(),
it returns -ENOMEM. We should handle si_shader_binary_upload() failure path
on si_create_compute_state().

CID 1394027

v2: Fixes from Edward O'Callaghan's review
 a) Update explicitly return value check with "si_shader_binary_upload() < 0"
 b) Update commit message.

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-11-21 21:09:06 +01:00
George Kyriazis 5b4d1500dd gallium: swr: Added swr build for windows
v4: Add windows-specific gen_knobs.{cpp|h} changes
v5: remove aggresive squashing of gen_knobs.py to this commit; added
SConscript to EXTRA_DIST in Makefile.am

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:47 -06:00
George Kyriazis 9e4e1f5190 swr: Modify gen_knobs.{cpp|h} creation script
Modify gen_knobs.py so that each invocation creates a single generated
file.  This is more similar to how the other generators behave.

v5: remove Scoscript edits from this commit; moved to commit that first
adds SConscript

Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:47 -06:00
George Kyriazis bc26e8d4a7 swr: Windows-related changes
- Handle dynamic library loading for windows
- Implement swap for gdi
- fix prototypes
- update include paths on configure-based build for swr_loader.cpp

v2: split to multiple patches
v3: split and reshuffle some more; renamed title
v4: move Makefile.am changes to other commit. Modify header files

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
George Kyriazis 87bd28210f swr: renamed duplicate swr_create_screen()
There are 2 swr_create_screen() functions.  One in swr_loader.cpp, which
is used during driver init, and the other is hiding in swr_screen.cpp,
which ends up in the arch-specific .dll/.so.

Rename the second one to swr_create_screen_internal(), to avoid confusion
in header files.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
George Kyriazis 974d280e81 swr: Handle windows.h and NOMINMAX
Reorder header files so that we have a chance to defined NOMINMAX before
mesa include files include windows.h

v3: split from bigger patch

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
Gwan-gyeong Mun 9c5b1c7990 radeonsi: Fix resource leak in gs_copy_shader allocation failure path
CID 1394028

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-22 00:04:59 +11:00
Nicolai Hähnle 42d5e91a2a radeonsi: store group_size_variable in struct si_compute
For compute shaders, we free the selector after the shader has been
compiled, so we need to save this bit somewhere else.  Also, make sure that
this type of bug cannot re-appear, by NULL-ing the selector pointer after
we're done with it.

This bug has been there since the feature was added, but was only exposed
in piglit arb_compute_variable_group_size-local-size by commit
9bfee7047b (which is totally unrelated).

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-21 08:19:43 +01:00
Ilia Mirkin 9145873b15 nvc0/ir: use levelZero flag when the lod is set to 0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-11-20 18:13:12 -05:00
Ilia Mirkin ea276512a0 swr: mark streamout buffers as written
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-19 10:40:37 -05:00
Nicolai Hähnle 9882ed85bd radeonsi: emit sample locations also when nr_samples == 1
Since the state tracker now enables MSAA in the hardware for the case
nr_samples == 1 as well, we need to set sample locations correctly for
this case.

The Polaris override is still needed for the non-MSAA case (when
nr_samples == 0).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-18 09:48:46 +01:00
Nicolai Hähnle 70454f5b55 radeonsi: allow sample mask export for single-sample framebuffers
This fixes GL45-CTS.sample_variables.mask.*.samples_1.*.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-18 09:48:43 +01:00
Eric Anholt 7f27ad5597 vc4: Try compiling our FSes in multithreaded mode on new kernels.
Multithreaded fragment shaders let us hide texturing latency by a
hyperthreading-style switch to another fragment shader.  This gets us up
to 20% framerate improvements on glmark2 tests.
2016-11-16 19:45:01 -08:00
Eric Anholt 45c022f2b0 vc4: Add support for ETC1 textures if the kernel is new enough.
The kernel changes for exposing the param have now been merged, so we can
expose it here.
2016-11-16 19:45:01 -08:00
Eric Anholt 7130260d12 vc4: Fix simulator mode missing-GETPARAM debug info.
The value is 0 since we didn't set it, we wanted to see the param.
2016-11-16 19:45:01 -08:00
Mun Gwan-gyeong 20c1623a11 vc4: Fix resource leak in register allocation failure path.
CID 1394322

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
2016-11-16 19:45:01 -08:00
Tim Rowley a456ea17fb swr: [rasterizer core] fix clear with multiple color attachments
Fixes fbo-mrt-alphatest

v2: styling fixes

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-16 14:21:04 -06:00
Nicolai Hähnle 6403a9e074 radeonsi: fix a subtle bounds checking corner case with 3-component attributes
I'm also sending out a piglit test, gl-2.0/vertexattribpointer-size-3,
which exposes this corner case.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:42 +01:00
Nicolai Hähnle 50c95d0c54 radeonsi: reject some 3-component formats as buffer textures
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:39 +01:00
Ilia Mirkin dafffd2f11 swr: mark color clamping as unsupported
There is no functionality in swr to clamp either vertex or frag colors.
This could be added in swr_shader, at which point these could be
re-enabled.

Fixes arb_color_buffer_float-render

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-15 20:26:32 -05:00
Ilia Mirkin 2b6b15ab3f swr: always enable adding start/base vertex to gl_VertexId
Fixes gl-3.2-basevertex-vertexid

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-15 20:26:29 -05:00
Ilia Mirkin 6364491a0b swr: add support for upper-left fragcoord position
Fixes glsl-arb-fragment-coord-conventions.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-15 20:26:11 -05:00
Ilia Mirkin a2c1d58ddb swr: make sure that all rendering is finished on shader destroy
Rendering could still be ongoing (or have yet to start) when the shader
is deleted. There's no refcounting on the shader text, so insert a
pipeline stall unconditionally when this happens.

[Note, we should instead introduce a way to attach work to
fences, so that the freeing can be done in the current fence.]

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:48 -05:00
Ilia Mirkin 7caed50ff4 swr: disable blending for integer formats
The EXT_texture_integer test says that blending and alphatest should
all be disabled. st/mesa takes care of alphatest already.

Fixes the ext_texture_integer-fbo-blending piglit test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:43 -05:00
Ilia Mirkin 2f19a974a5 swr: mark rgb9_e5 as unrenderable
The support in swr requires shaders to output the components as UINTs.
This is not how GL or Gallium work, and since this is not a
required-renderable format, just leave it out.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:35 -05:00
Ilia Mirkin 6fd398f48e swr: no support for shader stencil export
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:28 -05:00
Ilia Mirkin 96291478ea swr: mark both frag and vert textures read, don't forget about cbs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:22 -05:00
Ilia Mirkin 8c0f76e961 swr: fix texture layout for compressed formats
Fixes the texsubimage piglit and lets the copyteximage one get further.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:15 -05:00
Ilia Mirkin 00efbbc38c swr: add archrast generated files to gitignore
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:08 -05:00
Ilia Mirkin b53a33feef swr: [rasterizer jitter] don't bother quantizing unused channels
In a BGR10X2 or BGR5X1 situation, there's no need to try to quantize the
X channel - the default will have the proper quantization required.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:24:50 -05:00
Ilia Mirkin 5dd0b8d3c6 swr: [rasterizer memory] fix store tile for 128-bit ymajor tiling
Noticed by inspection.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:24:41 -05:00
Ilia Mirkin 45d9cd36fe swr: [rasterizer memory] add support for R32_FLOAT_X8X24 formats
This is the format used for the primary surface of a
PIPE_FORMAT_Z32_FLOAT_S8X24_UINT resource.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:24:18 -05:00
Marek Olšák 74e39de932 radeonsi: set IF_THRESHOLD to 3
Piglit regressions (radeonsi or LLVM bugs, they pass on softpipe):
- glsl-1.10/execution/variable-indexing/vs-output-array-vec3-index-wr
- glsl-1.10/execution/variable-indexing/vs-output-array-vec4-index-wr
- glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-col-row-wr
- glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-row-wr

Totals:
SGPRS: 1132185 -> 1168801 (3.23 %)
VGPRS: 907856 -> 906204 (-0.18 %)
Spilled SGPRs: 2011 -> 2425 (20.59 %)
Spilled VGPRs: 368 -> 96 (-73.91 %)
Scratch VGPRs: 1344 -> 1060 (-21.13 %) dwords per thread
Code Size: 35916164 -> 35705372 (-0.59 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 194010 -> 194921 (0.47 %)
Wait states: 0 -> 0 (0.00 %)

Before:
 VGPR SPILLING APPS   Shaders SpillVGPR ScratchVGPR
 alien_isolation         2938        38        40
 bioshock-infinite       1769       245       732
 dirt-showdown            548        85        72
 f1-2015                  776         0       320
 ue4_lightroom_inter..     74         0       180

After:
 VGPR SPILLING APPS   Shaders SpillVGPR ScratchVGPR
 alien_isolation         2938        38        40
 bioshock-infinite       1769         0       480
 dirt-showdown            548        58        40
 f1-2015                  776         0       320
 ue4_lightroom_inter..     74         0       180

Bioshock and DiRT benefit.

If I set IF_THRESHOLD=4, tesseract starts spilling VGPRs

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:23:40 +01:00
Marek Olšák 72217d4335 gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLD
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:23:40 +01:00
Marek Olšák 358079da2d radeonsi: set unsafe fpmath on FP instructions when allowed by R600_DEBUG
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 19:17:56 +01:00