nir: Recognize open-coded extract_u8.

Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack three bytes from an integer and convert each into a float: float((val >> 16u) & 0xffu) float((val >> 8u) & 0xffu) float((val >> 0u) & 0xffu) Instead of shifting, masking, and type converting like this: shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD mov(8) g17<1>F g16<8,8,1>UD shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD mov(8) g20<1>F g19<8,8,1>UD and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD mov(8) g22<1>F g21<8,8,1>UD i965 can simply extract a byte and convert to float in a single instruction: mov(8) g17<1>F g25.2<32,8,4>UB mov(8) g20<1>F g25.1<32,8,4>UB mov(8) g22<1>F g25.0<32,8,4>UB This patch implements the first step: recognizing byte extraction. A later patch will optimize out the conversion to float. instructions in affected programs: 28568 -> 27450 (-3.91%) helped: 7 cycles in affected programs: 210076 -> 203144 (-3.30%) helped: 7 This patch decreases the number of instructions in the two Unigine programs by: #1721: 4520 -> 4374 instructions (-3.23%) #1706: 3752 -> 3582 instructions (-4.53%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-21 09:09:48 -08:00 · 2016-01-21 09:09:48 -08:00 · 76289fbfa8
parent 01e92e7010
commit 76289fbfa8
1 changed files with 7 additions and 0 deletions
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@ -162,6 +162,7 @@ optimizations = [
   (('ishr', a, 0), a),
   (('ushr', 0, a), 0),
   (('ushr', a, 0), a),
+   (('iand', 0xff, ('ushr', a, 24)), ('ushr', a, 24)),
   # Exponential/logarithmic identities
   (('fexp2', ('flog2', a)), a), # 2^lg2(a) = a
   (('flog2', ('fexp2', a)), a), # lg2(2^a) = a
@ -213,6 +214,12 @@ optimizations = [
   (('f2i', ('ftrunc', a)), ('f2i', a)),
   (('f2u', ('ftrunc', a)), ('f2u', a)),

+   # Byte extraction
+   (('ushr', a, 24), ('extract_u8', a, 3), '!options->lower_extract_byte'),
+   (('iand', 0xff, ('ushr', a, 16)), ('extract_u8', a, 2), '!options->lower_extract_byte'),
+   (('iand', 0xff, ('ushr', a,  8)), ('extract_u8', a, 1), '!options->lower_extract_byte'),
+   (('iand', 0xff, a), ('extract_u8', a, 0), '!options->lower_extract_byte'),
+
   # Subtracts
   (('fsub', a, ('fsub', 0.0, b)), ('fadd', a, b)),
   (('isub', a, ('isub', 0, b)), ('iadd', a, b)),