KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	d2d3e04501	intel/fs: Use SHADER_OPCODE_SEND for surface messages Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	7f1cf046cd	intel/fs: Add a generic SEND opcode Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Kenneth Graunke	04c2f12ab2	i965: Drop mark_surface_used mechanism. The original idea was that the backend compiler could eliminate surfaces, so we would have it mark which ones are actually used, then shrink the binding table accordingly. Unfortunately, it's a pretty blunt mechanism - it can only prune things from the end, not the middle - since we decide the layout before we even start the backend compiler, and only limit the size. It also basically gives up if it sees indirect array access. Besides, we do the vast majority of our surface elimination in NIR anyway, not the backend - and I don't see that trend changing any time soon. Vulkan abandoned this plan a long time ago, and I don't use it in Iris, but it's still been kicking around in i965. I hacked shader-db to print the binding table size in bytes, and observed no changes with this patch. So, this code appears to do nothing useful. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-13 09:35:32 -08:00
Matt Turner	8534742404	intel/compiler: Split 64-bit MOV-indirects if needed Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:40 -08:00
Francisco Jerez	230a8a541d	intel/fs: Remove FS_OPCODE_UNPACK_HALF_2x16_SPLIT opcodes. These are broken on a future platform, but it turns out we don't need to fix them, since they're just type-converting moves with strided source. Kill them. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:09 -08:00
Francisco Jerez	812ede088f	intel/fs: Implement quad swizzles on ICL+. Align16 is no longer a thing, so a new implementation is provided using Align1 instead. Not all possible swizzles can be represented as a single Align1 region, but some fast paths are provided for frequently used swizzles that can be represented efficiently in Align1 mode. Fixes ~90 subgroup quad swap Vulkan CTS tests. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:08 -08:00
Sagar Ghuge	0a7664fe8c	intel/compiler: Change src1 reg type to unsigned doubleword To have uniform behavior while disassembling send(c) instruction use register type of unsigned doubleword for src1 when message descriptor is immediate value. Bspec does not specifiy anything for src1 immediate default type. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2018-10-23 12:44:24 -07:00
Jason Ekstrand	3cbc02e469	intel: Use TXS for image_size when we have a typed surface Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Ian Romanick	d515c75463	intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Francisco Jerez	48d6fc5eb6	intel/fs: Initialize mlen for gen7 varying pull constant load messages. This makes the message length available at the IR level, which should save some guesswork in a future commit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-09 23:46:58 -07:00
Francisco Jerez	2bac890bf5	intel/eu: Use descriptor constructors for dataport read messages. v2: Use SET_BITS macro instead of left shift (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-09 23:46:57 -07:00
Francisco Jerez	27c211e30f	intel/eu: Use descriptor constructors for sampler messages. v2: Use SET_BITS macro instead of left shift (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-09 23:46:57 -07:00
Francisco Jerez	1c90ae5acc	intel/eu: Provide desc immediate argument up front to brw_send_indirect_message(). The current approach of returning a setup instruction where additional descriptor fields can be specified is still supported in order to keep things working, but it will be removed later in this series. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-09 23:46:57 -07:00
Jason Ekstrand	e208bc3bb7	intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS We can just emit the MOV in the two places where we use this. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	2d7d652d5c	intel/fs: Disable opt_sampler_eot() in 32-wide dispatch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	db6ca13efc	intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates On g4x through Sandy Bridge, src1 (the coordinates) of the PLN instruction is required to be an even register number. When it's odd (which can happen with SIMD32), we have to emit a LINE+MAC combination instead. Unfortunately, we can't just fall through to the gen4 case because the input registers are still set up for PLN which lays out the four src1 registers differently in SIMD16 than LINE. v2 (Jason Ekstrand): - Take advantage of both accumulators and emit LINE LINE MAC MAC (Based on a patch from Francisco Jerez) - Unify the gen4 and gen4x-6 cases using a loop v3 (Jason Ekstrand): - Don't unify gen4 with gen4x-6 as this turns out to be more fragile than first thought without reworking the gen4 barycentric coordinate layout. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	73d60455e9	intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU operation and less like a send. This is less code over-all and, as a side-effect, it now properly handles execution groups and lowering so SIMD32 support just falls out. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	74b477039d	intel/fs: Add the group to the flag subreg number on SNB and older We want consistent behavior in the meaning of the flag_subreg field between SNB and IVB+. v2 (Jason Ekstrand): - Add some extra commentary Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	ce370902d4	intel/fs: Fix FB write message control codegen for SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	48241c780a	intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	5b6e91dd35	intel/fs: Remove program key argument from generator. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	a14fb0184a	intel/fs: Set up FB write message headers in the visitor Doing instruction header setup in the generator is awful for a number of reasons. For one, we can't schedule the header setup at all. For another, it means lots of implied writes which the instruction scheduler and other passes can't properly read about. The second isn't a huge problem for FB writes since they always happen at the end. We made a similar change to sampler handling in `ff4726077d`. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	c0a1c248b8	intel/fs: Pull FB write implied headers from src[0] Now that we have the implied header in src[0] for tracking purposes, we may as well use it in the generator. This makes things a tiny bit more general. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	381fac2740	intel/eu: Add some brw_get_default_ helpers This is much cleaner than everything that wants a default value poking at the bits of p->current directly. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-04 14:03:03 -07:00
Plamena Manolova	939312702e	i965: Add ARB_fragment_shader_interlock support. Adds suppport for ARB_fragment_shader_interlock. We achieve the interlock and fragment ordering by issuing a memory fence via sendc. Signed-off-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2018-06-01 16:36:39 +01:00
Francisco Jerez	4bd2047dee	intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot. When using multiple RT write messages to the same RT such as for dual-source blending or all RT writes in SIMD32, we have to set the "Last Render Target Select" bit on all write messages that target the last RT but only set EOT on the last RT write in the shader. Special-casing for dual-source blend works today because that is the only case which requires multiple RT write messages per RT. When we start doing SIMD32, this will become much more common so we add a dedicated bit for it. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-05-29 15:44:50 -07:00
Francisco Jerez	d3cd6b7215	intel/fs: Replace the CINTERP opcode with a simple MOV The only reason it was it's own opcode was so that we could detect it and adjust the source register based on the payload setup. Now that we're using the ATTR file for FS inputs, there's no point in having a magic opcode for this. v2 (Jason Ekstrand): - Break the bit which removes the CINTERP opcode into its own patch Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-05-29 15:44:50 -07:00
Jason Ekstrand	71a86d1fc6	intel/fs: Use groups for SIMD16 LINTERP on gen11+ This is better than compression control because it naturally extends to SIMD32. v2: - Push/pop instruction state around adjusted codegen (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-05-29 15:44:50 -07:00
Jason Ekstrand	a1a850cd34	intel/fs: Assert that the gen4-6 plane restrictions are followed The fall-back does not work correctly in SIMD16 mode and the register allocator should ensure that we never hit this case anyway. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-05-29 15:44:50 -07:00
Ian Romanick	d84b2ed1d7	intel/compiler: Silence unused parameter warnings in generate_foo methods Since all of the fs_generator::generate_foo methods take a fs_inst * as the first parameter, just remove the name to quiet the compiler. src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_barrier(fs_inst, brw_reg)’: src/intel/compiler/brw_fs_generator.cpp:743:41: warning: unused parameter ‘inst’ [-Wunused-parameter] fs_generator::generate_barrier(fs_inst inst, struct brw_reg src) ^~~~ src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_discard_jump(fs_inst)’: src/intel/compiler/brw_fs_generator.cpp:1326:46: warning: unused parameter ‘inst’ [-Wunused-parameter] fs_generator::generate_discard_jump(fs_inst inst) ^~~~ src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_pack_half_2x16_split(fs_inst, brw_reg, brw_reg, brw_reg)’: src/intel/compiler/brw_fs_generator.cpp:1675:54: warning: unused parameter ‘inst’ [-Wunused-parameter] fs_generator::generate_pack_half_2x16_split(fs_inst inst, ^~~~ src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_shader_time_add(fs_inst, brw_reg, brw_reg, brw_reg)’: src/intel/compiler/brw_fs_generator.cpp:1743:49: warning: unused parameter ‘inst’ [-Wunused-parameter] fs_generator::generate_shader_time_add(fs_inst inst, ^~~~ src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_set_simd4x2_header_gen9(brw_codegen, brw::vec4_instruction, brw_reg)’: src/intel/compiler/brw_vec4_generator.cpp:1412:52: warning: unused parameter ‘inst’ [-Wunused-parameter] vec4_instruction inst, ^~~~ src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_mov_indirect(brw_codegen, brw::vec4_instruction, brw_reg, brw_reg, brw_reg, brw_reg)’: src/intel/compiler/brw_vec4_generator.cpp:1430:41: warning: unused parameter ‘inst’ [-Wunused-parameter] vec4_instruction inst, ^~~~ src/intel/compiler/brw_vec4_generator.cpp:1432:63: warning: unused parameter ‘length’ [-Wunused-parameter] struct brw_reg indirect, struct brw_reg length) ^~~~~~ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-04-24 14:31:21 -04:00
Kenneth Graunke	70de61594d	i965/fs: Add infrastructure for generating CSEL instructions. v2 (idr): Don't allow CSEL with a non-float src2. v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt). Don't reset the access mode afterwards (suggested by Samuel and Matt). Add support for CSEL not modifying the flags to more places (requested by Matt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v3] Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-08 15:26:26 -08:00
Jason Ekstrand	8b4a5e641b	intel/fs: Add support for subgroup quad operations NIR has code to lower these away for us but we can do significantly better in many cases with register regioning and SIMD4x2. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Jason Ekstrand	b0858c1cc6	intel/fs: Add a couple of simple helper opcodes Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Jason Ekstrand	90c9f29518	i965/fs: Add support for nir_intrinsic_shuffle Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-07 12:13:47 -08:00
Kenneth Graunke	9fa95359df	intel: Drop program size pointer from vec4/fs assembly getters. These days, we're just passing a pointer to a prog_data field, which we already have access to. We can just use it directly. (In the past, it was a pointer to a separate value.) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-03-02 14:20:22 -08:00
Francisco Jerez	e7c9adca57	intel/eu: Plumb header present bit to codegen helpers for HDC messages. This makes sure that the header-present bit of the message descriptor is in sync with the IR instruction fields, which gives the optimizer more control to avoid the overhead of setting up a message header when it's possible to do so. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-02 11:28:56 -08:00
Francisco Jerez	cc0fc8b8ac	intel/ir: Allow representing additional flag subregisters in the IR. This allows representing conditional mods and predicates on f1.0-f1.1 at the IR level by adding an extra bit to the flag_subreg backend_instruction field. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-02 11:28:56 -08:00
Jason Ekstrand	ff4726077d	intel/fs: Set up sampler message headers in the visitor on gen7+ This gives the scheduler visibility into the headers which should improve scheduling. More importantly, however, it lets the scheduler know that the header gets written. As-is, the scheduler thinks that a texture instruction only reads it's payload and is unaware that it may write to the first register so it may reorder it with respect to a read from that register. This is causing issues in a couple of Dota 2 vertex shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104923 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2018-03-01 15:11:01 -08:00
Matt Turner	89fe5190a2	intel/compiler: Lower flrp32 on Gen11+ The LRP instruction is no more. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	2134ea3800	intel/compiler/fs: Implement ddy without using align16 for Gen11+ Align16 is no more. We previously generated an align16 ADD instruction to calculate DDY: add(16) g25<1>F -g23<4>.xyxyF g23<4>.zwzwF { align16 1H }; Without align16, we now implement it as: add(4) g25<1>F -g23<0,2,1>F g23.2<0,2,1>F { align1 1N }; add(4) g25.4<1>F -g23.4<0,2,1>F g23.6<0,2,1>F { align1 1N }; add(4) g26<1>F -g24<0,2,1>F g24.2<0,2,1>F { align1 1N }; add(4) g26.4<1>F -g24.4<0,2,1>F g24.6<0,2,1>F { align1 1N }; where only the first two instructions are needed in SIMD8 mode. Note: an earlier version of the patch implemented this in two instructions in SIMD16: add(8) g25<2>F -g23<4,2,0>F g23.2<4,2,0>F { align1 1N }; add(8) g25.1<2>F -g23.1<4,2,0>F g23.3<4,2,0>F { align1 1N }; but I realized that the channel enable bits will not be correct. If we knew we were under uniform control flow, we could emit only those two instructions however. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	62cfd4c656	intel/compiler/fs: Simplify ddx/ddy code generation The brw_reg() constructor just obfuscates things here, in my opinion. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	bed0267ff6	intel/compiler/fs: Pass fs_inst to generate_ddx/ddy instead of opcode In a future patch, generate_ddy will want to inspect inst->exec_size. Change generate_ddx as well for consistency. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	432674ce93	intel/compiler/fs: Implement FS_OPCODE_LINTERP with MADs on Gen11+ The PLN instruction is no more. Its functionality is now implemented using two MAD instructions with the new native-float type. Instead of pln(16) r20.0<1>:F r10.4<0;1,0>:F r4.0<8;8,1>:F we now have mad(8) acc0<1>:NF r10.7<0;1,0>:F r4.0<8;8,1>:F r10.4<0;1,0>:F mad(8) r20.0<1>:F acc0<8;8,1>:NF r5.0<8;8,1>:F r10.5<0;1,0>:F mad(8) acc0<1>:NF r10.7<0;1,0>:F r6.0<8;8,1>:F r10.4<0;1,0>:F mad(8) r21.0<1>:F acc0<8;8,1>:NF r7.0<8;8,1>:F r10.5<0;1,0>:F ... and in the case of SIMD8 only the first pair of MAD instructions is used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	b5d8781e19	intel/compiler/fs: Return multiple_instructions_emitted from generate_linterp If multiple instructions are emitted, special handling of things like conditional mod and NoDDClr/NoDDChk need to be performed. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Matt Turner	b1afdf9fc1	intel/compiler/fs: Fix application of cmod and saturate to LINE/MAC pair This isn't technically broken, but the next patch will make this function report whether it generated multiple instructions, and that information will be used to disable the application of conditional mod by the generic code. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-02-28 11:15:47 -08:00
Kenneth Graunke	60f15477da	i965: Drop render_target_start from binding table struct. We have to start render targets at binding table index 0 in order to use headerless FB write messages, and in fact already assume this in a bunch of places in the code. Let's finish that off, and not bother storing 0 in a struct to pretend to add it in a few places. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-01-22 10:03:52 -08:00
Matt Turner	c0ef14f5b1	Revert "Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+"" This reverts commit `2d04572038`. Acked-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-01-11 10:11:59 -08:00
Kenneth Graunke	a1afef8de0	i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes. These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-30 20:30:34 -08:00
Anuj Phogat	2d04572038	Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+" This reverts commit `9cd60fce9c`. Above commit caused 2000+ piglit tests to assert fail. Disabling the align1 mode on gen10 for now to avoid failures. Cc: Matt Turner <mattst88@gmail.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-12-22 16:40:40 -08:00
Jose Maria Casanova Crespo	c57a3f200d	i965/fs: Add byte scattered read message and fs support v2: Fix alignment style (Topi Pohjolainen) (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_read. v3: (Jason Ekstrand) - Use renamed brw_byte_scattered_data_element_from_bit_size method - Assert scattered read for Gen8+ and Haswell. - Use conditional expresion at components_read. - Include comment about params for scattered opcodes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00

1 2

73 Commits