nir: add a load/store vectorization pass

This pass combines intersecting, adjacent and identical loads/stores into
potentially larger ones and will be used by ACO to greatly reduce the
number of memory operations.

v2: handle nir_deref_type_ptr_as_array
v3: assume explicitly laid out types for derefs
v4: create less deref casts
v4: fix shared boolean vectorization
v4: fix copy+paste error in resources_different
v4: fix extract_subvector() to pass
    nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64
v4: rebase
v5: subtract from deref/offset instead of scheduling offset calculations
v5: various non-functional changes/cleanups
v5: require less metadata and preserve more
v5: rebase
v6: cleanup and improve dependency handling
v6: emit less deref casts
v6: pass undef to components not set in the write_mask for new stores
v7: fix 8-bit extract_vector() with 64-bit input
v7: cleanup creation of store write data
v7: update align correctly for when the bit size of load/store increases
v7: rename extract_vector to extract_component and update comment
v8: prevent combining of row-major matrix column acceses
v9: rework process_block() to be able to vectorize more
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v9: remove entry::store_value, since it will not be updated if it's was
    from a vectorized load
v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2
v9: handle nir_intrinsic_scoped_memory_barrier
v10: use nir_ssa_scalar
v10: handle non-32-bit offsets
v10: use signed offsets for comparison
v10: improve create_entry_key_from_offset()
v10: support load_shared/store_shared
v10: remove strip_deref_casts()
v10: don't ever pass NULL to memcmp
v10: remove recursion in gcd()
v10: fix outdated comment
v11: use the new nir_extract_bits()
v12: remove use of nir_src_as_const_value in resources_different
v13: make entry key hash function deterministic
v13: simplify mask_sign_extend()
v14: add comment in hash_entry_key() about hashing pointers

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
This commit is contained in:
Rhys Perry 2019-03-19 20:55:30 +00:00
parent b3a3e4d1d2
commit ce9205c03b
3 changed files with 1313 additions and 0 deletions

View File

@ -190,6 +190,7 @@ files_libnir = files(
'nir_opt_if.c',
'nir_opt_intrinsics.c',
'nir_opt_large_constants.c',
'nir_opt_load_store_vectorize.c',
'nir_opt_loop_unroll.c',
'nir_opt_move.c',
'nir_opt_peephole_select.c',

View File

@ -4194,6 +4194,13 @@ bool nir_opt_vectorize(nir_shader *shader);
bool nir_opt_conditional_discard(nir_shader *shader);
typedef bool (*nir_should_vectorize_mem_func)(unsigned align, unsigned bit_size,
unsigned num_components, unsigned high_offset,
nir_intrinsic_instr *low, nir_intrinsic_instr *high);
bool nir_opt_load_store_vectorize(nir_shader *shader, nir_variable_mode modes,
nir_should_vectorize_mem_func callback);
void nir_sweep(nir_shader *shader);
void nir_remap_dual_slot_attributes(nir_shader *shader,

File diff suppressed because it is too large Load Diff