From 5c88488a64524db410f279ee716cd340cee091c5 Mon Sep 17 00:00:00 2001 From: Kenneth Graunke Date: Tue, 19 Jul 2022 01:21:06 -0700 Subject: [PATCH] intel/eu: Fix XeHP register region validation for hstride == 0 Recently, we started using <1;1,0> register regions for consecutive channels, rather than the <8;8,1> we've traditionally used, as the <1;1,0> encoding can be compacted on XeHP. Since then, one of the EU validator rules has been flagging tons of instructions as errors: mov(16) g114<1>F g112<1,1,0>UD { align1 1H I@2 compacted }; ERROR: Register Regioning patterns where register data bit locations are changed between source and destination are not supported except for broadcast of a scalar. Our code for this restriction checked three things: #1: vstride != width * hstride || #2: src_stride != dst_stride || #3: subreg != dst_subreg Destination regions are always linear (no replicated values, nor any overlapping components), as they only have hstride. Rule #1 is requiring that the source region be linear as well. Rules #2-3 are straightforward: the subregister must match (for the first channel to line up), and the source/destination strides must match (for any subsequent channels to line up). Unfortunately, rules #1-2 weren't working when horizontal stride was 0. In that case, regions are linear if width == 1, and the stride between consecutive channels is given by vertical stride instead. So we adjust our src_stride calculation from src_stride = hstride * type_size; to: src_stride = (hstride ? hstride : vstride) * type_size; and adjust rule #1 to allow hstride == 0 as long as width == 1. While here, we also update the text of the rule to match the latest documentation, which apparently clarifies that it's the location of the LSB of the channel which matters. Fixes: 3f50dde8b35 ("intel/eu: Teach EU validator about FP/DP pipeline regioning restrictions.") Reviewed-by: Francisco Jerez Part-of: --- src/intel/compiler/brw_eu_validate.c | 29 +++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/src/intel/compiler/brw_eu_validate.c b/src/intel/compiler/brw_eu_validate.c index 4443c81ddfc..9abfb5c651d 100644 --- a/src/intel/compiler/brw_eu_validate.c +++ b/src/intel/compiler/brw_eu_validate.c @@ -617,6 +617,19 @@ is_packed(unsigned vstride, unsigned width, unsigned hstride) return false; } +/** + * Returns whether a region is linear + * + * A region is linear if its elements do not overlap and are not replicated. + * Unlike a packed region, intervening space (i.e. strided values) is allowed. + */ +static bool +is_linear(unsigned vstride, unsigned width, unsigned hstride) +{ + return vstride == width * hstride || + (hstride == 0 && width == 1); +} + /** * Returns whether an instruction is an explicit or implicit conversion * to/from half-float. @@ -1887,7 +1900,7 @@ special_requirements_for_handling_double_precision_data_types( } #undef DO_SRC - const unsigned src_stride = hstride * type_size; + const unsigned src_stride = (hstride ? hstride : vstride) * type_size; const unsigned dst_stride = dst_hstride * dst_type_size; /* The PRMs say that for CHV, BXT: @@ -1965,9 +1978,10 @@ special_requirements_for_handling_double_precision_data_types( * integer DWord multiply [or in case where a floating point data type * is used as destination]: * - * 1. Register Regioning patterns where register data bit locations - * are changed between source and destination are not supported on - * Src0 and Src1 except for broadcast of a scalar. + * 1. Register Regioning patterns where register data bit location + * of the LSB of the channels are changed between source and + * destination are not supported on Src0 and Src1 except for + * broadcast of a scalar. * * 2. Explicit ARF registers except null and accumulator must not be * used." @@ -1977,12 +1991,13 @@ special_requirements_for_handling_double_precision_data_types( is_double_precision)) { ERROR_IF(!is_scalar_region && BRW_ADDRESS_REGISTER_INDIRECT_REGISTER != address_mode && - (vstride != width * hstride || + (!is_linear(vstride, width, hstride) || src_stride != dst_stride || subreg != dst_subreg), "Register Regioning patterns where register data bit " - "locations are changed between source and destination are not " - "supported except for broadcast of a scalar."); + "location of the LSB of the channels are changed between " + "source and destination are not supported except for " + "broadcast of a scalar."); ERROR_IF((file == BRW_ARCHITECTURE_REGISTER_FILE && reg != BRW_ARF_NULL && !(reg >= BRW_ARF_ACCUMULATOR && reg < BRW_ARF_FLAG)) ||