Commit Graph

26205 Commits

Author SHA1 Message Date
Dave Airlie a1e8fcef47 gallivm: move first/last level jit texture members.
This lets us create an image structure with the same basic
types as the texture one.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-08-27 12:29:31 +10:00
Dave Airlie a8ef6b5755 llvmpipe: refactor jit type creation
This just cleans the code up so the texture/sampler type
creation can be reused.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-08-27 12:29:21 +10:00
Dave Airlie bba4d2f442 virgl: fix format conversion for recent gallium changes.
The virgl formats are fixed in time snapshots of the gallium ones,
we just need to provide a translation table between them when
we enter the hardware.

This fixes a regression since Eric renumbered the gallium table.

Fixes: c45c33a5a2 (gallium: Remove manual defining of PIPE_FORMAT enum values.)
Bugzilla: https://bugs.freedesktop.org/111454

v1 by Dave Airlie <airlied@redhat.com>
v2: virgl: Add a number of formats to the table that are used, e.g. for vertex
    attributes
v3: cover some more missing formats from a piglit run

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
2019-08-26 06:35:00 +00:00
Erico Nunes 4379dcc12d lima/ppir: enable vectorize optimization
pp has vector units and some operations can be optimized when bundled
together.
Benchmarking this with piglit shaders shows that the instruction count
can be greatly reduced on many examples with vectorize.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-25 18:29:12 +00:00
Erico Nunes 2a8a81d109 lima/ppir: lower selects to scalars
nir vec4 fcsel assumes that each component of the condition will be used
to select the same component from the options, but pp can't implement
that since it only has 1 component for the condition.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-25 18:29:12 +00:00
Erico Nunes 27e7603c34 lima: fix ppir spill stack allocation
The previous spill stack was fixed and too small, and caused instability
in programs requiring spilling for roughly more than one value.
This patch adds a dynamic calculation of the buffer size based on stack
utilization and switches it to a separate allocation at flush time that
will fit the shader that requires the largest buffer.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-25 20:08:59 +02:00
Jason Ekstrand f58e0405b6 intel/fs: Drop the gl_program from fs_visitor
It's not used by anything anymore now that so much lowering has been
moved into NIR.  Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge.  Short of a lot of pointless work,
that one's probably not going away.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-25 01:02:52 -05:00
Qiang Yu 5ff41b9fc5 lima: move format handling to unified place
Create a unified table to handle pipe format to texture
and render target format lookup.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-08-25 11:52:29 +08:00
Vasily Khoruzhick 681e99d11c lima/ppir: print register index and components number for spilled register
It can be useful for debugging purposes

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-24 08:17:31 -07:00
Vasily Khoruzhick 28d4b456a5 lima/ppir: add control flow support
This commit adds support for nir_jump_instr, if and loop
nir_cf_nodes.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-24 08:17:31 -07:00
Vasily Khoruzhick 1cdf585613 lima/ppir: add better liveness analysis
Add better liveness analysis that was modelled after one in vc4.
It uses live ranges and is aware of multiple blocks which is prerequisite
for adding CF support

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-24 08:17:31 -07:00
Vasily Khoruzhick d30a98c896 lima/ppir: validate shader outputs
Mali4x0 supports only gl_FragColor. gl_FragDepth is not supported.
Check that we don't get anything but gl_FragColor in shader outputs.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-24 08:17:25 -07:00
Vasily Khoruzhick 8dd195e865 lima/ppir: turn store_color into ALU node
We don't have a special OP to store color in PP, all we need to do is to
store gl_FragColor into reg0, thus it's just a mov and therefore ALU node.

Yet we still need to indicate that it's store_color op so regalloc ignores
its destination.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:47 -07:00
Vasily Khoruzhick 7f814d2b46 lima/ppir: create ppir block for each corresponding NIR block
Create ppir block for each corresponding NIR block and populate
its successors. It will be used later in liveness analysis and
in CF support

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:47 -07:00
Vasily Khoruzhick 4e695489df lima/ppir: add dummy op
We can get following from NIR:

(1) r1 = r2
(2) r2 = ssa1

Note that r2 is read before it's assigned, so there's no node for
it in comp->var_nodes. We need to create a dummy node in this case
which sole purpose is to hold ppir_dest with reg in it.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:47 -07:00
Vasily Khoruzhick d11e1b7909 lima/ppir: add write after read deps for registers
For cases like:

(1) r1 = r2
(2) r2 = ssa1

We need to add (1) as dependency of (2), otherwise scheduler may
reorder them.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:47 -07:00
Vasily Khoruzhick cd8c569ced lima/ppir: fix ordering deps
There can be several root nodes, i.e.:

(1) r0 = r1
(2) r2 = r3
(3) branch if (ssa1)

We need to make (3) depend on (1) and (2), old code added
dependency only for (2), and (1) was kept as root node since there
is no branch/discard or store color between two movs.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:47 -07:00
Vasily Khoruzhick bf2872eeb2 lima/ppir: set write mask for texture loads if dest is reg
Destination for texture load can be a reg, so we need to
set write mask in this case

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:47 -07:00
Vasily Khoruzhick fd129817f0 lima/ppir: add support for unconditional branches and condition negation
We need 'negate' modifier for branch condition to minimize branching. Idea
is to generate following:

current_block: { ...; if (!statement) branch else_block; }
then_block: { ...; branch after_block; }
else_block: { ... }
after_block: { ... }

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:46 -07:00
Vasily Khoruzhick e15af23b73 lima/ppir: clone ld_{uni,tex,var} into each block
ppir_lower_load() and ppir_lower_load_texture() assume that node
is in the same block as its successors, fix it by cloning each
ld_uni and ld_tex to every block.

It also reduces register pressure since values never cross block
boundaries and thus never appear in live_in or live_out of any block,
so do it for varyings as well.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:46 -07:00
Vasily Khoruzhick 172f2ad805 lima/ppir: refactor const lowering
Const nodes are now cloned for each user, i.e. const is guaranteed to have
exactly one successor, so we can use ppir_do_one_node_to_instr() and
drop insert_to_each_succ_instr()

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-23 18:19:46 -07:00
Caio Marcelo de Oliveira Filho 63f0259aeb iris: Guard GEN9-only function in Iris state to avoid warning
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-23 13:25:27 -07:00
Kenneth Graunke 7ee7b0ecbc iris: Fix large timeout handling in rel2abs()
...by copying the implementation of anv_get_absolute_timeout().

Appears to fix a CTS test with 32-bit builds:
GTF-GL46.gtf32.GL3Tests.sync.sync_functionality_clientwaitsync_flush

Fixes: f459c56be6 ("iris: Add fence support using drm_syncobj")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-08-23 10:32:01 -07:00
Kenneth Graunke 9310ae6f68 iris: Set MOCS in all STATE_BASE_ADDRESS commands
Rafael Antognolli tracked down a performance gap between i965 and iris
in Synmark2's OglCSDof microbenchmark, noting that iris was performing
substantially more memory reads and writes, with substantially fewer
L3 hits.  He suggested that something might be wrong with MOCS, or L3
configs, at which point I came up with a theory...

It would appear that the STATE_BASE_ADDRESS command updates the MOCS
settings for various base addresses even if you don't specify the
"Modify Enable" bit for that address.  Until now, we had been setting
only the MOCS for bases we intended to change, leaving the others
"blank" which is MOCS table entry 0, which is uncached.

Most data access has a more specific MOCS (e.g. in SURFACE_STATE),
but scratch access uses the Stateless Data Port Access MOCS from
STATE_BASE_ADDRESS.  So this meant all scratch access was uncached.

Improves performance in Synmark2's OglCSDof by 2x, bringing iris
on par with the existing i965 driver.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-23 10:21:48 -07:00
Connor Abbott f59076f8a7 radeonsi/nir: Rewrite output scanning
Similarly to before, this didn't properly handle varying structs with
doubles in them.

This doesn't fix any tests, but was noticed while looking at the code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-23 11:05:31 +02:00
Connor Abbott 9395277972 radeonsi/nir: Rewrite store intrinsic gathering
The old version wasn't as accurate as it could be, and didn't handle
double variables inside structs correctly. Walk the path to compute the
actual components affected.

In combination with the previous commit fixes
KHR-GL45.enhanced_layouts.varying_structure_locations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-23 11:05:31 +02:00
Connor Abbott 87cca891c3 radeonsi/nir: Add const_index when loading GS inputs
This fixes loading GS inputs in structures or arrays.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-23 11:05:31 +02:00
Connor Abbott 82589d3ffd radeonsi/nir: Don't add const offset to indirect
This is already done in get_deref_offset() in the common code. We were
adding it twice accidentally.

Fixes KHR-GL45.enhanced_layouts.varying_array_locations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-23 11:05:31 +02:00
Connor Abbott 97d592c855 radeonsi/nir: Don't recompute num_inputs and num_outputs
Don't repeat what mesa/st already does.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-23 11:05:31 +02:00
Samuel Pitoiset 1fd60db4a1 ac,radv,radeonsi: remove LLVM 7 support
Now that LLVM 9 will be released soon, we will only support
LLVM 8, 9 and master (10).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-23 08:12:34 +02:00
Kenneth Graunke 2d79925034 iris: Avoid unnecessary resolves on transfer maps
We were always resolving the buffer as if we were accessing it via
CPU maps, which don't understand any auxiliary surfaces.  But we often
copy to a temporary using BLORP, which understands compression just
fine.  So we can avoid the resolve, and accelerate the copy as well.

Fixes: 9d1334d2a0 ("iris: Use copy_region and staging resources to avoid transfer stalls")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-08-22 18:31:17 -07:00
Kenneth Graunke 136629a1e3 iris: Drop copy format hacks from copy region based transfer path.
This doesn't work for compressed formats, as the source texture and
temporary texture would have different block sizes.  (Forcing the driver
to always take the GPU path would expose the bug.)  Instead, just use
the source format for the temporary, and let blorp_copy deal with
overrides.

The one case where we can't do this is ASTC, because isl won't let us
create a linear ASTC surface.  Fall back to the CPU paths there for now.

Fixes: 9d1334d2a0 ("iris: Use copy_region and staging resources to avoid transfer stalls")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-08-22 18:31:17 -07:00
Kenneth Graunke 1cd13ccee7 iris: Update fast clear colors on Gen9 with direct immediate writes.
Gen11 stores the fast clear color in an "indirect clear buffer", as
a packed pixel value.  Gen9 hardware stores it as a float or integer
value, which is interpreted via the format.  We were trying to store
that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM
it from there to the actual SURFACE_STATE bytes where it's stored.

This unfortunately doesn't work for blorp_copy(), which does bit-for-bit
copies, and overrides the format to a CCS-compatible UINT format.  This
causes the clear color to be interpreted in the overridden format.

Normally, we provide the clear color on the CPU, and blorp_blit.c:2611
converts it to a packed pixel value in the original format, then unpacks
it in the overridden format, so the clear color we use expands to the
bits we originally desired.

However, BLORP doesn't support this pack/unpack with an indirect clear
buffer, as it would need to do the math on the GPU.  On Gen11+, it isn't
necessary, as the hardware does the right thing.

This patch changes Gen9 to stop using an indirect clear buffer and
simply do PIPE_CONTROLs with post-sync write immediate operations
to store the new color over the surface states for regular drawing.
BLORP continues streaming out surface states, and handles fast clear
colors on the CPU.

Fixes: 53c484ba8a ("iris: blorp using resolve hooks")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-08-22 18:31:14 -07:00
Kenneth Graunke 117a0368b0 iris: Fix broken aux.possible/sampler_usages bitmask handling
For renderable surfaces, we allocate SURFACE_STATEs for each bit in
res->aux.possible_usages.  Sampler views use res->aux.sampler_usages.

When pinning buffers, we call surf_state_offset_for_aux() to calculate
the offset to the desired surface state.  surf_state_offset_for_aux()
took an aux_modes parameter, which should be one of those two fields.
However...it was not using that parameter.  It always used the broader
res->aux.possible_usages field directly.

One of the callers, update_clear_value(), was passing incorrect masks
for this parameter.  It iterated through the bits in order, using
u_bit_scan(), which destructively modifies the mask.  So each time we
called it, the count of bits before our selected mode was 0, which would
cause us to always update the SURFACE_STATE for ISL_AUX_USAGE_NONE,
rather than updating each in turn.  This was hidden by the earlier bug
where surf_state_offset_for_aux() ignored the parameter.

Fixes: 7339660e80 ("iris: Add aux.sampler_usages.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-08-22 18:31:14 -07:00
Kenneth Graunke f6c44549ee iris: Replace devinfo->gen with GEN_GEN
This is genxml, we can compile out this code.

Fixes: 2660667284 ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-08-22 18:31:14 -07:00
Alyssa Rosenzweig 2c5ba2ee6e panfrost: Implement gl_FragCoord correctly
Rather than passing through the transformed gl_Position, we can use the
hardware-level varying for this, which will correctly handle
gl_FragCoord.w

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-22 13:31:39 -07:00
Alyssa Rosenzweig eeebf5c2df panfrost: Remove vertex buffer offset from its size
The offset is added to the base address, so we need to subtract it from
the size to maintain the same end address and thus prevent a buffer
overflow:

   end_address = start_address + size

   start_address' = start_address + offset
   size' = size - offset

   end_address' = start_address' + size'
                = (start_address + offset) + (size - offset)
                = (start_address + size) + (offset - offset)
                = start_address + size
                = end_address

   QED.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-22 13:31:39 -07:00
Alyssa Rosenzweig f06e8f7fe9 pan/decode: Validate MFBD tags
These tags need to match up with what's actually described by the MFBD,
so check this. Once this is checked, since the type and contents of the
FBD are obvious from printing above, there's no need to explicitly mark
off the framebuffer line.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-22 12:53:10 -07:00
Eric Engestrom 6db1dfe347 swr: use LLVM version string instead of re-computing it
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-22 16:08:09 +01:00
Eric Engestrom 7f5ef97a07 llvmpipe: use LLVM version string instead of re-computing it
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-22 16:08:09 +01:00
Tapani Pälli 728ebcdec2 iris/android: fix build and link with libmesa_intel_perf
Fixes: 0fd4359733 "iris/perf: implement routines to return counter info"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-22 10:01:14 +03:00
Alyssa Rosenzweig 0ae72df013 panfrost: Fix PIPE_BUFFER spacing
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:44:45 -07:00
Alyssa Rosenzweig d4542f8cb5 panfrost: Implement depth range clipping
This should fix glDepthRangef issues. Eventually, something similar
should allow implementing the depth bounds test.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:44:45 -07:00
Alyssa Rosenzweig 5e268a01d2 panfrost: Don't bail on PIPE_BUFFER
We can handle some of it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:43:02 -07:00
Alyssa Rosenzweig aa404120e1 panfrost: Pass stream_output_info by reference
It's a large structure, apparently.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:38:31 -07:00
Alyssa Rosenzweig 27b6264630 panfrost: Guard against NULL rasterizer explicitly
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:38:31 -07:00
Alyssa Rosenzweig 87afc2e2da panfrost: Fix missing ret assignment in DRM code
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:38:31 -07:00
Alyssa Rosenzweig c43fa6b320 panfrost: Hoist bo != NULL check before dereference
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:38:31 -07:00
Alyssa Rosenzweig a3c1ab2e9a panfrost: Hoist job != NULL check
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:38:31 -07:00
Alyssa Rosenzweig 9cee21f0c9 panfrost: Prevent potential integer overflow in instancing
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-21 10:38:31 -07:00