KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Francisco Jerez	acf98ff933	intel/fs: Teach instruction scheduler about GRF bank conflict cycles. This should allow the post-RA scheduler to do a slightly better job at hiding latency in presence of instructions incurring bank conflicts. The main purpuse of this patch is not to improve performance though, but to get conflict cycles to show up in shader-db statistics in order to make sure that regressions in the bank conflict mitigation pass don't go unnoticed. Acked-by: Matt Turner <mattst88@gmail.com>	2017-12-07 15:56:49 -08:00
Francisco Jerez	af2c320190	intel/fs: Implement GRF bank conflict mitigation pass. Unnecessary GRF bank conflicts increase the issue time of ternary instructions (the overwhelmingly most common of which is MAD) by roughly 50%, leading to reduced ALU throughput. This pass attempts to minimize the number of bank conflicts by rearranging the layout of the GRF space post-register allocation. It's in general not possible to eliminate all of them without introducing extra copies, which are typically more expensive than the bank conflict itself. In a shader-db run on SKL this helps roughly 46k shaders: total conflicts in shared programs: 1008981 -> 600461 (-40.49%) conflicts in affected programs: 816222 -> 407702 (-50.05%) helped: 46234 HURT: 72 The running time of shader-db itself on SKL seems to be increased by roughly 2.52%±1.13% with n=20 due to the additional work done by the compiler back-end. On earlier generations the pass is somewhat less effective in relative terms because the hardware incurs a bank conflict anytime the last two sources of the instruction are duplicate (e.g. while trying to square a value using MAD), which is impossible to avoid without introducing copies. E.g. for a shader-db run on SNB: total conflicts in shared programs: 944636 -> 623185 (-34.03%) conflicts in affected programs: 853258 -> 531807 (-37.67%) helped: 31052 HURT: 19 And on BDW: total conflicts in shared programs: 1418393 -> 987539 (-30.38%) conflicts in affected programs: 1179787 -> 748933 (-36.52%) helped: 47592 HURT: 70 On SKL GT4e this improves performance of GpuTest Volplosion by 3.64% ±0.33% with n=16. NOTE: This patch intentionally disregards some i965 coding conventions for the sake of reviewability. This is addressed by the next squash patch which introduces an amount of (for the most part boring) boilerplate that might distract reviewers from the non-trivial algorithmic details of the pass. The following patch is squashed in: SQUASH: intel/fs/bank_conflicts: Roll back to the nineties. Acked-by: Matt Turner <mattst88@gmail.com>	2017-12-07 15:56:06 -08:00
Jose Maria Casanova Crespo	a1e257a5bf	i965/fs: Use untyped_surface_read for 16-bit load_ssbo SSBO loads were using byte_scattered read messages as they allow reading 16-bit size components. byte_scattered messages can only operate one component at a time so we needed to emit as many messages as components. But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the untyped_surface_read message to read pairs of 16-bit components using only one message. Once each pair is read it is unshuffled to return the proper 16-bit components. vec3 case is assimilated to vec4 but the 4th component is ignored. 16-bit scalars are read using one byte_scattered_read message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using unshuffle 16 reads (Chema Casanova) v3: Use W and D types insead of HF and F in shuffle to avoid rounding erros (Jason Ekstrand) Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand) v4: Use subscript insead of chaging type and stride (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	ce2e572c4c	i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg Currently, we use byte-scattered write messages for storing 16-bit into an SSBO. This is because untyped surface messages have a fixed 32-bit size. This patch optimizes these 16-bit writes by combining 2 values (e.g, two consecutive components aligned with 32-bits) into a 32-bit register, packing the two 16-bit words. 16-bit single component values will continue to use byte-scattered write messages. The same will happens when the first consecutive component is not aligned 32-bits. This optimization reduces the number of SEND messages used for storing 16-bit values potentially by 2 or 4, which cuts down execution time significantly because byte-scattered writes are an expensive operation as they only write a component for message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using shuffle 16 write and enable writes of 16bit vec4 with only one message of 32-bits. (Chema Casanova) v3: - Fix coding style (Eduardo Lima) - Reorganize code to avoid duplication. (Jason Ekstrand) - Include new comments to explain the length calculations to fix alignment issues of components. (Jason Ekstrand) - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand) v4: (Jason Ekstrand) - Reorganize 64-bit ssbo-writes to avoid using slots_per_component. - Comment about why suffle is needed when using byte_scattered_write. Signed-off-by: Eduardo Lima <elima@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jason Ekstrand	3282309f74	i965/fs: Enables 16-bit load_ubo with sampler load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit surface format defined. So when reading 16-bit components with the sampler we need to unshuffle two 16-bit components from each 32-bit component. Using the sampler avoids the use of the byte_scattered_read message that needs one message for each component and is supposed to be slower. v2: (Jason Ekstrand) - Simplify component selection and unshuffling for different bitsizes - Remove SKL optimization of reading only two 32-bit components when reading 16-bits types. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	3db31c0b06	i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components This helpers are used to load/store 16-bit types from/to 32-bit components. The functions shuffle_32bit_load_result_to_16bit_data and shuffle_16bit_data_for_32bit_write are implemented in a similar way than the analogous functions for handling 64-bit types. v1: Explain need of temporary in shuffle operations. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	fa4a9d63bb	i965/fs: Use byte scattered read for 16-bit load_ssbo Used to enable 16-bit reads at do_untyped_vector_read, that is used on the following intrinsics: * nir_intrinsic_load_shared * nir_intrinsic_load_ssbo v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: - Add bitsize to scattered read operation (Jason Ekstrand) - Remove implementation of 16-bit UBO read from this patch. - Avoid assertion at opt_algebraic caused by ADD of two IMM with offset with BRW_REGISTER_TYPE_UD type found on matrix tests. (Jose Maria Casanova) v4: (Jason Ekstrand) - Put if case for 16-bits at the beginning of the if ladder. - Use type_sz(dest.type) * 8 as bit_size parameter for scattered read. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	c57a3f200d	i965/fs: Add byte scattered read message and fs support v2: Fix alignment style (Topi Pohjolainen) (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_read. v3: (Jason Ekstrand) - Use renamed brw_byte_scattered_data_element_from_bit_size method - Assert scattered read for Gen8+ and Haswell. - Use conditional expresion at components_read. - Include comment about params for scattered opcodes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	a4031bdfa9	i965/fs: Predicate byte scattered writes if needed While on Untyped Surface messages the bits of the execution mask are ANDed with the corresponding bits of the Pixel/Sample Mask, that is not the case for byte scattered writes. That is needed to avoid ssbo stores writing on helper invocations. So when that can affect, we load the sample mask, and predicate the send message. Note: the need for this patch was tested with a custom test. Right now the 16 bit storage CTS tests doesnt need this path in order to get a full pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	96f1926aab	i965/fs: Use byte_scattered_write on 16-bit store_ssbo We need to rely on byte scattered writes as untyped writes are 32-bit size. We could try to keep using 32-bit messages when we have two or four 16-bit elements, but for simplicity sake, we use the same message for any component number. We revisit this aproach in the follwing patches. v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: (Jason Ekstrand) - Include bit_size to scattered write message and remove namespace - specific for scattered messages. - Move comment to proper place. - Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo. (Jose Maria Casanova) - Take into account that get_nir_src returns now WORD types for 16-bit sources instead of DWORD. v4: (Jason Ekstrand) - Rename lenght variable to num_components. - Include assertions before emit_untyped_write. - Remove type_slot in favor of num_slot and first_slot. Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	f1a9936ee1	i965/fs: Add byte scattered write message and fs support v2: (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_write. v3: - Remove leftover newline (Topi Pohjolainen) - Rename brw_data_size to brw_scattered_data_element and use defines instead of an enum (Jason Ekstrand) - Assert scattered write for Gen8+ and Haswell (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	d038deaa40	i965/fs: Add remove_extra_rounding_modes optimization Although from SPIR-V point of view, rounding modes are attached to the operation/destination, on i965 it is a status, so we don't need to explicitly set the rounding mode if the one we want is already set. Taking into account that the default mode is RTE, one possible optimization would be optimize out the first RTE set for each block. For in order to work, we would need to take into account block interrelationships. At this point, it is not worth to complicate the optimization for such small gain. v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode (Curro) v3: Reset optimization for every block. (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	82fa4d45e7	i965/fs: Enable rounding mode on f2f16 ops By default we don't set the rounding mode. We only set round-to-near-even or round-to-zero mode if explicitly set from nir. v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode (Curro) v3: Use new helper brw_rnd_mode_from_nir_op (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	d6cd14f213	i965/fs: Define new shader opcode to set rounding modes Although it is possible to emit them directly as AND/OR on brw_fs_nir, having a specific opcode makes it easier to remove duplicate settings later. v2: (Curro) - Set thread control to 'switch' when using the control register - Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode. - Avoid magic numbers setting rounding mode field at control register. v3: (Curro) - Remove redundant and add missing whitespace lines. - Match printing instruction to IR opcode "rnd_mode" v4: (Topi Pohjolainen) - Fix code style. Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	ac8d4734f6	i965: Add support for control register Control register cr0 in i965 can be used to change the rounding modes in 32-bit to 16-bit floating-point conversions. From intel Skylake PRM, vol 07, section "Register and Tegister Regions", subsection "Control Register" (page 754): "Subregister cr0.0:ud contains normal operation control fields such as the floating-point mode ... " Floating-point Rounding mode is changed at bits 5:4 of cr0.0: "Rounding Mode. This field specifies the FPU rounding mode. It is initialized by Thread Dispatch." 00b = Round to Nearest or Even (RTNE) 01b = Round Up, toward +inf (RU) 10b = Round Down, toward -inf (RD) 11b = Round Toward Zero (RTZ)" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	5d5ee507fb	i965/fs: Handle 32-bit to 16-bit conversions Conversions to 16-bit need having aligment between the 16-bit and 32-bit types. So the conversion operations unpack 16-bit types to with an stride=2 and then applies a MOV with the conversion. v2 (Jason Ekstrand): - Avoid the general use of stride=2 for 16-bit register types. v3 (Topi Pohjolainen) - Code style fix (Jason Ekstrand) - Now nir_op_f2f16 was renamed to nir_op_f2f16_undef because conversion to f16 with undefined rounding is explicit Signed-off-by: Eduardo Lima <elima@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	a05b6f25bf	i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type Note that we don't remove the assert at i965/vec4. At this point half float support is only for the scalar backend. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	75a88d8567	i965: Support for 16-bit base types in helper functions v2: Fixed calculation of scalar size for 16-bit types. (Jason Ekstrand) v3: Fix coding style (Topi Pohjolainen) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Eduardo Lima <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Alejandro Piñeiro	2d28ca7000	i965/vec4: Handle 16-bit types at type_size_xvec4 These types have similar vec4 sizes as their 32-bit counterparts. The vec4 backend doesn't support 16-bit types and probably never will, but this method is called by the scalar backend at fs_visitor::nir_setup_outputs(), so we still need to provide valid vec4 sizes for 16-bit types. In the future, something different should be implemented to avoid this dependency. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Rafael Antognolli	2919adffe9	intel/compiler: Implement WaClearTDRRegBeforeEOTForNonPS. The bspec describes: "WA: Clear tdr register before send EOT in all non-PS shader kernels mov(8) tdr0:ud 0x0:ud {NoMask}" Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-12-01 11:27:27 -08:00
Iago Toral Quiroga	8620f7ebbc	i965/vec4: use a temp register to compute offsets for pull loads 64-bit pull loads are implemented by emitting 2 separate 32-bit pull load messages, where the second message loads from an offset at +16B. That addition of 16B to the original offset should not alter the original offset register used as source for the pull load instruction though, since the compiler might use that same offset register in other instructions (for example, for other pull loads in the shader code that take that same offset as reference). If the pull load is 32-bit then we only need to emit one message and we don't need to do offset calculations, but in that case the optimizer should be able to drop the redundant MOV. Fixes the following test on Haswell: KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components Reviewed-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007	2017-11-30 07:57:53 +01:00
Iago Toral Quiroga	f1873956db	i965/vec4: fix splitting of interleaved attributes When we split an instruction that reads an uniform value (vstride 0) we need to respect the vstride on the second half of the instruction (that is, the second half should read the same region as the first). We were doing this already, but we didn't account for stages that have interleaved input attributes which also have a vstride of 0 and need the same treatment. Fixes the following on Haswell: KHR-GL45.enhanced_layouts.varying_locations KHR-GL45.enhanced_layouts.varying_array_locations KHR-GL45.enhanced_layouts.varying_structure_locations Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Andres Gomez <agomez@igalia.com>	2017-11-24 09:24:06 +01:00
Matt Turner	beaea7abfa	i965/fs: Check ADD/MAD with immediates in satprop unit test The gen had to be changed from 4 to 6 so that we could test MAD, which is new on Gen6. mad_imm_float_neg_mov_sat tests the case fixed by the previous commit. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-11-21 10:13:07 -08:00
Matt Turner	a05af1f7b8	i965/fs: Handle negating immediates on MADs when propagating saturates MADs don't take immediate sources, but we allow them in the IR since it simplifies a lot of things. I neglected to consider that case. Fixes: `4009a9ead4` ("i965/fs: Allow saturate propagation to propagate negations into MADs.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103616 Reported-and-Tested-by: Ruslan Kabatsayev <b7.10110111@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-11-21 10:13:07 -08:00
Tapani Pälli	6236ffeb83	intel: fix disasm_info memory leaks Fixes: `4f82b17287` ("i965: Rewrite disassembly annotation code") Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-11-21 08:36:43 +02:00
Jason Ekstrand	1eab327ba7	i965: Stop including brw_cfg.h in brw_disasm_info.h The brw_disasm_info header is included by certain tools in order to get shader assembly from binaries so it's a semi-external header. Including brw_cfg.h also pulls in brw_shader.h so you end up getting quite a bit of our back-end compiler internals. Instead, make the couple of forward declarations we need and make the header more stand-alone. This fixes the meson build. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `4f82b17287`	2017-11-17 21:51:16 -08:00
Andres Gomez	1866f7aee5	i965: Correct disasm_info usage in eu_validate test Fixes: `4f82b17287` ("i965: Rewrite disassembly annotation code") Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-11-18 03:07:06 +02:00
Matt Turner	821ec473a8	i965: Rename intel_asm_annotation -> brw_disasm_info It was the only file named intel_* in the compiler. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-17 12:14:38 -08:00
Matt Turner	4f82b17287	i965: Rewrite disassembly annotation code The old code used an array to store each "instruction group" (the new, better name than the old overloaded "annotation"), and required a memmove() to shift elements over in the array when we needed to split a group so that we could add an error message. This was confusing and difficult to get right, not the least of which was because the array has a tail sentinel not included in .ann_count. Instead use a linked list, a data structure made for efficient insertion. Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-17 12:14:38 -08:00
Matt Turner	f80e97346b	i965: Simplify annotation_insert_error() Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-17 12:14:38 -08:00
Matt Turner	f4276ef7ef	i965: Move common code out of #ifdef I'm going to change the call in a later patch and with the difference in indentation level it wasn't immediately obvious that the calls were identical. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-17 12:14:38 -08:00
Kenneth Graunke	e48cc01be9	intel: Drop mtypes.h include from brw_compiler.h. This isn't necessary and causes trouble for a project I'm working on.	2017-11-15 09:37:32 -08:00
Kenneth Graunke	ff964916dc	i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code. We use the same hardware mechanism for both atomic counters and SSBO atomics, so there's really no benefit to maintaining separate code to handle each case. Instead, we can just use Rob's shiny new NIR pass to convert atomic_uints to SSBOs, and delete piles of code. The ssbo_start section of the binding table becomes a combined ABO and SSBO section, with ABOs first, then SSBOs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-15 09:37:32 -08:00
Matt Turner	a31d038208	Revert "intel/fs: Use a pure vertical stride for large register strides" This reverts commit `e8c9e65185`. With the actual bug fixed (by commit `6ac2d16901`), this is not necessary. I'm doubtful of its correctness in any case.	2017-11-14 11:24:08 -08:00
Matt Turner	6ac2d16901	i965/fs: Fix extract_i8/u8 to a 64-bit destination The MOV instruction can extract bytes to words/double words, and words/double words to quadwords, but not byte to quadwords. For unsigned byte to quadword, we can read them as words and AND off the high byte and extract to quadword in one instruction. For signed bytes, we need to first sign extend to word and the sign extend that word to a quadword. Fixes the following test on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-14 10:56:18 -08:00
Matt Turner	cfcfa0b9cd	i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK Fixes the following tests on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115	2017-11-14 10:56:18 -08:00
Jason Ekstrand	951a5dc4cc	intel/nir: Use the correct indirect lowering masks in link_shaders Previously, if we were linking a vec4 VS with a SIMD8/16 FS, we wouldn't lower indirects on the fragment shader which is wrong. Instead of using a single indirect mask, take advantage of our new little helper. Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-08 20:10:04 -08:00
Jason Ekstrand	3e63cf893f	intel/nir: Break the linking code into a helper in brw_nir.c Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-08 14:09:51 -08:00
Jason Ekstrand	7364f080f9	intel/nir: Add a helper for getting the NoIndirect mask Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-08 14:09:49 -08:00
Jason Ekstrand	d002950e54	intel/fs/nir: Return Q types from brw_reg_type_for_bit_size Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-11-07 10:41:24 -08:00
Jason Ekstrand	dee58ecd2e	intel/fs/nir: Use Q immediates for load_const on gen8+ Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-11-07 10:41:24 -08:00
Jason Ekstrand	9bb34892bf	intel/fs/nir: Setup immediates based on type in i2b and f2b Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-11-07 10:41:24 -08:00
Jason Ekstrand	1cb210f4bc	intel/reg: Add helpers for 64-bit integer immediates Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-11-07 10:41:24 -08:00
Jason Ekstrand	ab9220edd6	nir,intel/compiler: Use a fixed subgroup size The GL_ARB_shader_ballot spec says that gl_SubGroupSizeARB is declared as a uniform. This means that it cannot change across an invocation such as a draw call or a compute dispatch. For compute shaders, we're ok because we only ever use one dispatch size. For fragment, however, the hardware dynamically chooses between SIMD8 and SIMD16 which violates the spec. Instead, let's just pick a subgroup size based on the shader stage. The fixed size we choose for compute shaders is a bit higher than strictly needed but there's no real harm in that. The advantage is that, if they do anything interesting with the value, NIR will see it as an immediate and can optimize better. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	a026458020	nir/lower_subgroups: Lower ballot intrinsics to the specified bit size Ballot intrinsics return a bitfield of subgroups. In GLSL and some SPIR-V extensions, they return a uint64_t. In SPV_KHR_shader_ballot, they return a uvec4. Also, some back-ends would rather pass around 32-bit values because it's easier than messing with 64-bit all the time. To solve this mess, we make nir_lower_subgroups take a new parameter called ballot_bit_size and it lowers whichever thing it gets in from the source language (uint64_t or uvec4) to a scalar with the specified number of bits. This replaces a chunk of the old lowering code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	28da82f978	nir: Add a new subgroups lowering pass This commit pulls nir_lower_read_invocations_to_scalar along with most of the guts of nir_opt_intrinsics (which mostly does subgroup lowering) into a new nir_lower_subgroups pass. There are various other bits of subgroup lowering that we're going to want to do so it makes a bit more sense to keep it all together in one pass. We also move it in i965 to happen after nir_lower_system_values to ensure that because we want to handle the subgroup mask system value intrinsics here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1ca3a94427	intel/fs: Don't use automatic exec size inference The automatic exec size inference can accidentally mess things up if we're not careful. For instance, if we have add(4) g38.2<4>D g38.1<8,2,4>D g38.2<8,2,4>D then the destination register will end up having a width of 2 with a horizontal stride of 4 and a vertical stride of 8. The EU emit code sees the width of 2 and decides that we really wanted an exec size of 2 which doesn't do what we wanted. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	dc4cf11dfc	intel/fs: Explicitly set EXECUTE_1 where needed Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	ab378734f5	intel/eu: Explicitly set EXECUTE_1 where needed Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	8280560705	intel/eu: Make automatic exec sizes a configurable option We have had a feature in codegen for some time that tries to automatically infer the execution size of an instruction from the width of its destination. For things such as fixed function GS, clipper, and SF programs, this is very useful because they tend to have lots of hand-rolled register setup and trying to specify the exec size all the time would be prohibitive. For things that come from a higher-level IR, however, it's easier to just set the right size all the time and the automatic exec sizes can, in fact, cause problems. This commit makes it optional while enabling it by default. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	7a82ad54bb	intel/fs: Rework zero-length URB write handling Originally we tried to handle this case based on slots_valid. However, there are a number of ways that this can go wrong. For one, we throw away any trailing slots which either aren't written or are set to VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not actually write anything there. Between the lot of these, it was possible to end up in a case where we tried to do a regular URB write but ended up with a length of 1 which is invalid. This commit moves it to the end and makes it based on a new boolean flag urb_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6132992cdb	intel/compiler/fs: Set up subgroup invocation as a system value Subgroup invocation is computed using a vector immediate and some dispatch-aware arithmetic. Unfortunately, due to the vector arithmetic, and the fact that it's frequently read 16-wide, it's not something that can easily be CSEd by the back-end compiler. There are a few different possible approaches to this problem: 1) Emit the code to calculate the subgroup invocation on-the-fly and trust NIR to do the CSE. This is what we were doing. 2) Add a back-end instruction for the subgroup ID. This has the advantage of helping the back-end compiler with CSE but has the downside of very poor scheduling for the calculation because it has to be emitted in the back-end. 3) Emit the calculation at the top of the program and re-use the result. This gets rid of the CSE problem but comes at the cost of an extra live register. This commit switches us from 1) to 3). We choose to store the subgroup invocation values as a W type to reduce the impact of the extra live register. Trusting NIR and using 1) was fine but we're soon going to want to use the subgroup invocation value for other things in the back-end compiler and this makes it much easier to do without having to worry about CSE problems. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	295605c930	intel/cs: Push subgroup ID instead of base thread ID We're going to want subgroup ID for SPIR-V subgroups eventually anyway. We really only want to push one and calculate the other from it. It makes a bit more sense to push the subgroup ID because it's simpler to calculate and because it's a real API thing. The only advantage to pushing the base thread ID is to avoid a single SHL in the shader. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6411defdcd	intel/cs: Re-run final NIR optimizations for each SIMD size With the advent of SPIR-V subgroup operations, compute shaders will have to be slightly different depending on the SIMD size at which they execute. In order to allow us to do dispatch-width specific things in NIR, we re-run the final NIR stages for each sIMD width. One side-effect of this change is that we start rallocing fs_visitors which means we need DECLARE_RALLOC_CXX_OPERATORS. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	4e79a77cdc	intel/compiler: Move the destructor from vec4_visitor to backend_shader Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	16ada419d7	i965/fs: Get rid of the early return in brw_compile_cs Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	80ddfab2f5	intel/cs: Rework the way thread local ID is handled Previously, brw_nir_lower_intrinsics added the param and then emitted a load_uniform intrinsic to load it directly. This commit switches things over to use a specific NIR intrinsic for the thread id. The one thing I don't like about this approach is that we have to copy thread_local_id over to the new visitor in import_uniforms. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	25f7453c9e	intel/fs: Mark 64-bit values as being contiguous This isn't often a problem , when we're in a compute shader, we must push the thread local ID so we decrement the amount of available push space by 1 and it's no longer even and 64-bit data can, in theory, span it. By marking those uniforms contiguous, we ensure that they never get split in half between push and pull constants. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	c4c8cba705	intel/cs: Ignore runtime_check_aads_emit for CS It's only set on gen4-5 which clearly don't support compute shaders. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	d4de813d86	intel/cs: Stop setting dispatch_grf_start_reg Nothing ever reads it for compute shaders because it's always 1. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b1a9cdede4	intel/cs: Drop max_dispatch_width checks from compile_cs The only things that adjust fs_visitor::max_dispatch_width are render target writes which don't happen in compute shaders so they're pointless. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1077981eb5	intel/fs: Remove min_dispatch_width from fs_visitor It's 8 for everything except compute shaders. For compute shaders, there's no need to duplicate the computation and it's just a possible source of error. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b299ded02e	intel/fs: use pull constant locations to check for first compile of a shader Before, we bailing in assign_constant_locations based on the minimum dispatch size. The more direct thing to do is simply to check for whether or not we have constant locations and bail if we do. For nir_setup_uniforms, it's completely safe to do it multiple times because we just copy a value from the NIR shader. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	103081c9a9	intel/fs: Retype dest to match value in read[First]Invocation This is what we really wanted all along. Always retyping to D works because that's what get_nir_src() always gives us, at least for 32-bit types. The SPIR-V variants of these operations accept arbitrary types and we need this if we're going to handle 64 or 16-bit values. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	ebaee9da4a	intel/fs: Uniformize the index in readInvocation The index is any value provided by the shader and this can be called in non-uniform control flow so we can't just take component 0. Found by inspection. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b67230de63	intel/fs: Protect opt_algebraic from OOB BROADCAST indices Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	aa4ff4b98c	i965/fs/nir: Don't stomp 64-bit values to D in get_nir_src Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	ec8c6649f1	i965/fs/nir: Minor refactor of store_output Stop retyping the output of shuffle_64bit_data_for_32bit_write. It's always BRW_REGISTER_TYPE_D which is perfectly fine for writing out. Also, when we change get_nir_src to return something with a 64-bit type for 64-bit values, the retyping will not be at all what we want. Also, retyping the output based on src.type before we whack it back to 32 bits is a problem because the output is always 32 bits. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	030d2b5016	i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write All callers of this function allocate a fs_reg expressly to pass into it. It's much easier if we just let the helper allocate the register. While we're here, we switch it to doing the MOVs with an integer type so that we don't accidentally canonicalize floats on half of a double. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6197a6b7ac	i965/fs/nir: Simplify 64-bit store_output The swizzles weren't doing any good because swiz is just XYZW. Also, we were emitting an extra set of MOVs because shuffle_64bit_data_for_32bit already does a MOV for us. Finally, the temporary was only ever used inside the inner loop so there's no need for it to actually be an array. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	18fde36ced	intel/fs: Use the original destination region for int MUL lowering Some hardware (CHV, BXT) have special restrictions on register regions when doing integer multiplication. We want to respect those when we lower to DxW multiplication. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	d54f8ec744	intel/fs: Fix integer multiplication lowering for src/dst hazards Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	fd1bcccc2d	intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core The same workaround we need for 64-bit values on little core also takes care of the Ivy Bridge problem and does so a bit more efficiently so we can drop that code while we're here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6041a31e77	intel/eu: Fix broadcast instruction for 64-bit values on little-core We're not using broadcast for any 32-bit types right now since we mostly use it for emit_uniformize on 32-bit buffer indices. However, SPIR-V subgroups are going to need it for 64-bit so let's make it work. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	10e4feed39	intel/eu/reg: Add a subscript() helper This is similar to the identically named fs_reg helper. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	068beb41d8	intel/eu: Just modify the offset in brw_broadcast This means we have to drop const from a variable but it also means that 100% of the code which deals with the offset limit is in one place. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	e3bcc86133	intel/compiler: Add some restrictions to MOV_INDIRECT and BROADCAST These restrictions effectively already existed due to the way we use indirect sources but weren't being directly enforced. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1b8ef49f48	intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all For some reason, the any/all predicates don't work properly with SIMD32. In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read the correct subset of the flag register and you end up getting garbage in the second half. Work around this by using a pair of 1-wide MOVs and scattering the result. This fixes the any/all instructions for SIMD32. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1f41663007	intel/fs: Use an explicit D type for vote any/all/eq intrinsics The any/all intrinsics return a boolean value so D or UD is the correct type. Unfortunately, get_nir_dest has the annoying behavior of returnning a float type by default. This causes format conversion which gives us -1.0f or 0.0f in the register. If the consumer of the result does an integer comparison to zero, it will give you the right boolean value but if we do something more clever based on the 0/~0 assumption for booleans, this will give the wrong value. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6c00240bc6	intel/fs: Don't stomp f0.1 in SIMD16 ballot In fragment shaders f0.1 is used for discards so doing ballot after a discard can potentially cause the discard to not happen. However, we don't support SIMD32 fragment shaders yet so this isn't a problem. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	def013a863	intel/fs: Use ANY/ALL32 predicates in SIMD32 We have ANY/ALL32 predicates and, for the most part, they work just fine. (See the next commit for more details.) Also, due to the way that flag registers are handled in hardware, instruction splitting is able to split the CMP correctly. Specifically, that hardware looks at the execution group and knows to shift it's flag usage up correctly so a 2H instruction will write to f0.1 instead of f0.0. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	0d905597fe	intel/fs: Be more explicit about our placement of [un]zip Before, we were careful to place the zip after the last of the split instructions but did unzip on-demand. This changes things so that the unzips go before all of the split instructions and the unzip comes explicitly after all the split instructions. As a side-effect of this change, we now emit the split instruction from highest SIMD group to lowest instead of low to high. We could have kept the old behavior, but it shouldn't matter and this made the code easier. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	fcd4adb9d0	intel/fs: Pass builders instead of blocks into emit_[un]zip This makes it far more explicit where we're inserting the instructions rather than the magic "before and after" stuff that the emit_[un]zip helpers did based on block and inst. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	e8c9e65185	intel/fs: Use a pure vertical stride for large register strides Register strides higher than 4 are uncommon but they can happen. For instance, if you have a 64-bit extract_u8 operation, we turn that into UB -> UQ MOV with a source stride of 8. Our previous calculation would try to generate a stride of <32;8,8>:ub which is invalid because the maximum horizontal stride is 4. To solve this problem, we instead use a stride of <8;1,0>. As noted in the comment, this does not work as a destination but that's ok as very few things actually generate that stride. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	172e8e42c4	intel/fs: Don't allocate a param array for zero push constants Thanks to the ralloc invariant of "any pointer returned from ralloc can be used as a context", calling ralloc_size with a size of zero will cause it to allocate at least a header. If we don't have any push constants, then NULL is perfectly acceptable (and even preferred). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2017-11-02 09:55:21 -07:00
Jason Ekstrand	7b4387519c	intel/fs: Alloc pull constants off mem_ctx It doesn't actually matter since the only user of push constants, i965, ralloc_steals it back to NULL but it's more consistent and probably fixes memory leaks in some error cases. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-02 09:55:21 -07:00
Jordan Justen	f082d7f64f	intel/compiler: Add functions to get prog_data and prog_key sizes for a stage v2: * Return unsigned instead of size_t. (Ken) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-31 23:36:54 -07:00
Jordan Justen	05b1193361	intel/compiler: Add union types for prog_data and prog_key stages Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-31 23:36:54 -07:00
Jordan Justen	3dcbc5cdaa	intel/compiler: Remove final_program_size from brw_compile_* The caller can now use brw_stage_prog_data::program_size which is set by the brw_compile_* functions. Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-31 23:36:54 -07:00
Carl Worth	540636045f	intel/compiler: add new field for storing program size This will be used by the on disk shader cache. v2: * Set in brw_compile_* rather than brw_codegen_*. (Jason) Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com> [jordan.l.justen@intel.com: Only add to brw_stage_prog_data] Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-31 23:36:54 -07:00
Eric Engestrom	ceaad79f85	i965: remove unused variable Fixes: `2c873060d3` "i965: Delete unused brw_vs_prog_data::nr_attributes field." Cc: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2017-10-30 16:32:05 +00:00
Ian Romanick	6403efbe74	glsl: Remove ir_binop_greater and ir_binop_lequal expressions NIR does not have these instructions. TGSI and Mesa IR both implement them using < and >=, repsectively. Removing them deletes a bunch of code and means I don't have to add code to the SPIR-V generator for them. v2: Rebase on 2+ years of change... and fix a major bug added in the rebase. text data bss dec hex filename 8255291 268856 294072 8818219 868e2b 32-bit i965_dri.so before 8254235 268856 294072 8817163 868a0b 32-bit i965_dri.so after 7815339 345592 420592 8581523 82f193 64-bit i965_dri.so before 7813995 345560 420592 8580147 82ec33 64-bit i965_dri.so after Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-30 09:27:09 -07:00
Tapani Pälli	446c5726ec	i965: fix blorp stage_prog_data->param leak Patch uses mem_ctx for allocation to ensure param array gets freed later. ==6164== 48 bytes in 1 blocks are definitely lost in loss record 61 of 193 ==6164== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299) ==6164== by 0x12E31C6C: ralloc_size (ralloc.c:121) ==6164== by 0x130189F1: fs_visitor::assign_constant_locations() (brw_fs.cpp:2095) ==6164== by 0x13022D32: fs_visitor::optimize() (brw_fs.cpp:5715) ==6164== by 0x13024D5A: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:6229) ==6164== by 0x1302549A: brw_compile_fs (brw_fs.cpp:6570) ==6164== by 0x130C4B07: blorp_compile_fs (blorp.c:194) ==6164== by 0x130D384B: blorp_params_get_clear_kernel (blorp_clear.c:79) ==6164== by 0x130D3C56: blorp_fast_clear (blorp_clear.c:332) ==6164== by 0x12EFA439: do_single_blorp_clear (brw_blorp.c:1261) ==6164== by 0x12EFC4AF: brw_blorp_clear_color (brw_blorp.c:1326) ==6164== by 0x12EFF72B: brw_clear (brw_clear.c:297) Fixes: `8d90e28839` ("intel/compiler: Allocate pull_param in assign_constant_locations") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-30 08:19:37 +02:00
Kenneth Graunke	d1b392d060	i965: Delete brw_wm_prog_key::drawable_height. This has been unused since we switched to nir_lower_wpos_ytransform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-29 20:52:02 -07:00
Topi Pohjolainen	97e01adfd5	intel/compiler/gen9: Pixel shader header only workaround Fixes intermittent GPU hangs on Broxton with an Intel internal test case. There are plenty of similar fragment shaders in piglit that do not use any varyings and any uniforms. According to the documentation special timing is needed between pipeline stages. Apparently we just don't hit that with piglit. Even with the failing test case one doesn't always get the hang. Moreover, according to the error states the hang happens significantly later than the execution of the problematic shader. There are multiple render cycles (primitive submissions) in between. I've also seen error states where the ACTHD points outside the batch. Almost as if the hardware writes somewhere that gets used later on. That would also explain why piglit doesn't suffer from this - most tests kick off one render cycle and any corruption is left unseen. v2 (Ken): Instead of enabling push constants, enable one of the inputs (PSIZ). v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe() happy. Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-10-28 10:07:29 +03:00
Kenneth Graunke	2c873060d3	i965: Delete unused brw_vs_prog_data::nr_attributes field. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-10-27 02:53:38 -07:00
Kevin Rogovin	75d10e4c84	intel/compiler: brw_validate_instructions to take const void* instead of void* The disassembler does not (and should not) be modifying the data. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-26 10:43:48 -07:00
Jason Ekstrand	d24311b7b5	intel/compiler: Call nir_lower_system_values in brw_preprocess_nir Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-25 16:14:09 -07:00
Jason Ekstrand	562b8d458c	intel/eu: Use EXECUTE_1 for JMPI The PRM says "The execution size must be 1." In `73137997e2`, the execution size was set to 1 when it should have been BRW_EXECUTE_1 (which maps to 0). Later, in `dc2d3a7f5c`, JMPI was used for line AA on gen6 and earlier and we started manually stomping the exeution size to BRW_EXECUTE_1 in the generator. This commit fixes the original bug and makes brw_JMPI just do the right thing. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `73137997e2`	2017-10-25 16:14:09 -07:00
Alejandro Piñeiro	4723933b8e	i965/fs: Add brw_reg_type_from_bit_size utility method Returns the brw_type for a given ssa.bit_size, and a reference type. So if bit_size is 64, and the reference type is BRW_REGISTER_TYPE_F, it returns BRW_REGISTER_TYPE_DF. The same applies if bit_size is 32 and reference type is BRW_REGISTER_TYPE_HF it returns BRW_REGISTER_TYPE_F v2 (Jason Ekstrand): - Use better unreachable() messages - Add Q types Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-25 16:14:09 -07:00
Jason Ekstrand	99778e7f9f	i965/fs/nir: Use the nir_src_bit_size helper Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-25 16:14:09 -07:00
Jason Ekstrand	fa6e74e33e	intel/fs: Handle flag read/write aliasing in needs_src_copy In order to implement the ballot intrinsic, we do a MOV from flag register to some GRF. If that GRF is used in a SEL, cmod propagation helpfully changes it into a MOV from the flag register with a cmod. This is perfectly valid but when lower_simd_width comes along, it simply splits into two instructions which both have conditional modifiers. This is a problem since we're reading the flag register. This commit makes us check whether or not flags_written() overlaps with the flag values that we are reading via the instruction source and, if we have any interference, will force us to emit a copy of the source. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-25 16:14:09 -07:00
Jordan Justen	b35e8c3b86	intel/nir: Zero local index const struct for valgrind & nir_serialize Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-25 12:36:21 -07:00
Rob Clark	2207af032b	meson: extract out variable for nir_algebraic.py Also needed in freedreno/ir3. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-10-24 15:33:40 -04:00
Eric Anholt	e91c3540fc	i965: Fix memmem compiler warnings. gcc is throwing this warning in my meson build: ../src/intel/compiler/brw_eu_validate.c:50:11: warning argument 1 null where non-null expected [-Wnonnull] return memmem(haystack.str, haystack.len, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ needle.str, needle.len) != NULL; ~~~~~~~~~~~~~~~~~~~~~~~ The first check for CONTAINS has a NULL error_msg.str and 0 len. The glibc implementation will exit without looking at any haystack bytes if haystack.len < needle.len, so this was safe, but silence the warning anyway by guarding against implementation variablility. Fixes: `122ef3799d` ("i965: Only insert error message if not already present") Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-10-24 10:51:18 -07:00
Matt Turner	9cd60fce9c	i965/fs: Use align1 mode on ternary instructions on Gen10+ Align1 mode offers some nice features over align16, like access to more data types and the ability to use a 16-bit immediate. This patch does not start using any new features. It just emits ternary instructions in align1 mode. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	8c16c9c677	i965: Add align1 ternary instruction emission support Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	f11fa5ac6c	i965: Add align1 ternary instruction disassembler support Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	6c7fc9b73a	i965: Add align1 ternary instruction-word support Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	3b2c868848	i965: Add align1 ternary instruction support to conversion functions Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	281e8b8f27	i965: Add align1 ternary instruction field encodings Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	5f6ee55e68	i965: Add functions to abstract access to 3src register types Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	e15dac319b	i965: Rename brw_inst's functions that access the 3src register type Put hw_ in the name so that it's clear these are the hardware encodings. Similar to commit `9fb8323328` ("i965: Rename brw_inst's functions that access the register type") Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Matt Turner	e7f3b82e03	i965: Rename brw_inst 3src functions in preparation for align1 Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Matt Turner	ba50b538af	i965: Print subreg in units of type-size on ternary instructions The instruction word contains SubRegNum[4:2] so it's in units of dwords (hence the * 4 to get it in terms of bytes). Before this patch, the subreg would have been wrong for DF arguments. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Matt Turner	3f14150e9a	i965: Add functions for brw_reg_type <-> hw 3src type Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Matt Turner	4c857d1f3b	i965: Move brw_reg_type_is_floating_point to brw_reg_type.h I'm going to call this from brw_inst.h, and I don't want to have to include all of brw_reg.h. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Jason Ekstrand	59fb59ad54	nir: Get rid of nir_shader::stage It's redundant with nir_shader::info::stage. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-10-20 12:49:17 -07:00
Samuel Iglesias Gonsálvez	9e515cf381	i965/vec4: remove setting default LOD in the backend It is already done in NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-20 08:29:53 +02:00
Samuel Iglesias Gonsálvez	c6d7d09bd0	i965/fs: remove setting default LOD in the backend It is already done in NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-20 08:29:53 +02:00
Kenneth Graunke	68f69ebdcc	i965: Use is_scheduling_barrier instead of schedule_node::is_barrier. Commit `a73116ecc6` tried to make add_barrier_deps() walk to the next barrier, and stop. To accomplish that, it added an is_barrier flag. Unfortunately, this only works half of the time. The issue is that add_barrier_deps() walks both backward (to the previous barrier), and forward (to the next barrier). It also sets is_barrier. Assuming that we're processing instructions in forward order, this means that is_barrier will be set for previous instructions, but not future ones. So we'll never see it, and walk further than we need to. dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 now compiles its shaders in 3.6 seconds instead of 3.3 minutes. Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Pallavi G <pallavi.g@intel.com>	2017-10-19 10:19:20 -07:00
Kenneth Graunke	3d112a7cd4	i965: Move fs_inst::has_side_effects()'s eot check to the parent class. This eliminates a layer of wrapping, and makes a backend_instruction sufficient. The downside is that it exposes 'eot' to the vec4 backend, which it doesn't need, but can basically happily ignore. Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Pallavi G <pallavi.g@intel.com>	2017-10-19 10:19:20 -07:00
Jason Ekstrand	79d403417c	intel/cs: Make thread_local_id a regular builtin param This is a lot more natural than special casing it all over the place. We still have to do a bit of special-casing in assign_constant_locations but it's not special-cased quite as bad as it was before. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:31 -07:00
Jason Ekstrand	8d90e28839	intel/compiler: Allocate pull_param in assign_constant_locations Now that everything is nicely ralloc'd, we can allocate the pull_param array in assign_constant_locations instead of higher up. We can also re-allocate the param array so that it's exactly the needed size. This should save us some memory because we're not allocating the total needed param space for both push and pull. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:31 -07:00
Jason Ekstrand	29737eac98	intel: Allocate prog_data::[pull_]param deeper inside the compiler Now that we're always growing the param array as-needed, we can allocate the param array in common code and stop repeating the allocation everywere. In order to keep things sane, we ralloc the [pull_]param array off of the compile context and then steal it back to a NULL context later. This doesn't get us all the way to where prog_data::[pull_]param is purely an out parameter of the back-end compiler but it gets us a lot closer. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:31 -07:00
Jason Ekstrand	4dfb8b3416	intel/vs: Grow the param array for clip planes Instead of requiring the caller of brw_compile_vs to figure it out, just grow the param array on-demand. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	6bcc5c0c75	intel/cs: Grow prog_data::param on-demand for thread_local_id_index Instead of making the caller of brw_compile_cs add something to the param array for thread_local_id_index, just add it on-demand in brw_nir_intrinsics and grow the array. This is now safe to do because everyone is now using ralloc for prog_data::param. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	b1d1b7222a	intel/compiler: Make brw_nir_lower_intrinsics compute-specific It's already only ever called from brw_compile_cs and only handles compute intrinsics. Let's just make it CS-specific. We can always make it handle other stages again later if we want. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	2db9470d88	intel/compiler: Add a helper for growing the prog_data::param array Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	4efd079aba	intel/compiler: Add a flag for pull constant support The Vulkan driver does not support pull constants. It simply limits things such that we can always push everything. Previously, we were determining whether or not to push things based on whether or not the prog_data::pull_param array is non-null. This is rather hackish and about to stop working. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	cfc7ed75eb	i965: Store image_param in brw_context instead of prog_data This burns an extra 10k of memory or so in the case where you don't have any images. However, if you have several shaders which use images, this should be much less memory. It also gets rid of a part of prog_data that really has nothing to do with the compiler. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	2975e4c56a	intel: Rewrite the world of push/pull params This moves us away to the array of pointers model and onto a model where each param is represented by a generic uint32_t handle. We reserve 2^16 of these handles for builtins that get generated by somewhere inside the compiler and have well-defined meanings. Generic params have handles whose meanings are defined by the driver. The primary downside to this new approach is that it moves a little bit of the work that we would normally do at compile time to draw time. On my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to have any measurable affect on OglBatch7. So, while this may come back to bite us, it doesn't look too bad. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:29 -07:00
Kenneth Graunke	6f5abf3146	i965: Fix output register sizes when multiple variables share a slot. ARB_enhanced_layouts allows multiple output variables to share the same location - and these variables may not have the same sizes. For example, consider these output variables: // consume X/Y/Z components of 6 vectors layout(location = 0) out vec3 a[6]; // consumes W component of the first vector layout(location = 0, component = 3) out float b; Looking at the first declaration, we see that VARYING_SLOT_VAR0 needs 24 components worth of space (vec3 padded out to a vec4, 4 * 6 = 24). But looking at the second declaration, we would think that VARYING_SLOT_VAR0 needs only 4 components of space (a single float padded out to a vec4). nir_setup_outputs() only considered the space requirements of the first declaration it happened to see, so if 'float b' came first, it would underallocate the output register space, causing brw_fs_validator.cpp to assert fail about inst->dst.offset exceeding the register size. Fixes Piglit's tests/spec/arb_enhanced_layouts/execution/component-layout/ vs-to-fs-array-interleave-single-location.shader_test. Thanks to Tim Arceri for finding this bug and writing a test! Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-10-10 17:29:37 -07:00
Kenneth Graunke	03087686ff	i965: Don't try to decode types for non-existent src1. KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks has a MOV that hits this validation path. MOVs don't have a src1 file, but calling brw_inst_src1_type() was tripping on src1.file being BRW_IMMEDIATE_VALUE and the hw_type being something invalid for immediates. To work around this, just pretend src1 is src0 if there isn't a src1. Fixes: `2572c2771d` (i965: Validate "Special Requirements for Handling Double Precision Data Types") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2017-10-10 15:11:35 -07:00
Iago Toral Quiroga	5ec21eb1a0	i965/tes: account for the fact that dvec3/4 inputs take two slots When computing the total size of the URB for tessellation evaluation inputs we were not accounting for this, and instead we were always assuming that each input would take a single vec4 slot, which could lead to computing a smaller read size than required. Specifically, this is a problem when the last input is a dvec3/4 such that its XY components are stored in the the second half of a payload register (which can happen if the offset for the input in the URB is not 64-bit aligned because there are 32-bit inputs mixed in) and the ZW components in the first half of the next, as in this case we would fail to account for the extra slot required for the ZW components. Fixes (requires another fix in CTS currently in review): KHR-GL45.enhanced_layouts.varying_locations KHR-GL45.enhanced_layouts.varying_array_locations Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-10 08:59:54 +02:00
Jason Ekstrand	7463d50580	intel/compiler: Don't propagate cmod into integer multiplies No shader-db change on Sky Lake. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-05 11:54:49 -07:00
Jason Ekstrand	b91ecee04a	intel/compiler: Don't cmod propagate into a saturated operation Shader-db results on Sky Lake: total instructions in shared programs: 12954445 -> 12955125 (0.01%) instructions in affected programs: 141862 -> 142542 (0.48%) helped: 0 HURT: 626 Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-05 11:54:49 -07:00
Matt Turner	2572c2771d	i965: Validate "Special Requirements for Handling Double Precision Data Types" I did not implement: CNL's restriction on 64-bit int + align16, because I don't think we'll ever use this combination regardless of hardware generation. The restriction on immediate DF -> F conversions, because there's no reason to ever generate that, and I don't even know how DF -> F conversions are supposed to work in Align16 since (1) the dst stride must be 1, but (2) the dst stride would have to be 2 for src and dst strides to be aligned.	2017-10-04 14:08:54 -07:00
Matt Turner	98298c7e3d	i965: Fix and enable forgotten validation test I seem to have forgotten I still had work to do.	2017-10-04 14:08:54 -07:00
Matt Turner	122ef3799d	i965: Only insert error message if not already present Some restrictions require something like strides to match between src and dest. For multi-source instructions, I'd rather encapsulate the logic for not inserting already present errors in ERROR_IF than open-coding it multiple places.	2017-10-04 14:08:54 -07:00
Matt Turner	5e76cf153c	i965: Avoid validation error when src1 is not present There can be no violation of the restriction that source offsets are aligned if there is only one source offset.	2017-10-04 14:08:54 -07:00
Matt Turner	cacc229ba0	i965: Remove validate_reg() Replaced by the assembly validator, and in fact gets in the way of writing tests for the assembly validator.	2017-10-04 14:08:54 -07:00
Matt Turner	678d88bcee	i965: Add and use STRIDE and WIDTH macros You'll notice there were bugs in some of the code being replaced. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	1fcdb1cbea	i965: Add GLK, CFL, CNL to test_eu_validate.c	2017-10-04 14:08:54 -07:00
Matt Turner	6db5ec7deb	i965: Fix support for disassembling 64-bit integer immediates The type suffixes were wrong, and the 16 was missing the 0 prefix. Fixes: `92f787ff86` ("i965: Add support for disassembling 64-bit integer immediates") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	7e88f93469	i965/fs: Rewrite fsign64 to skip the float -> double conversion ... without the float -> double conversion. Low power parts have additional restrictions when it comes to operating on 64-bit types, and the instruction used to do the conversion violates one of them: specifically, the restriction that "Source and Destination horizontal stride must be aligned to the same qword". Previously we generated a float and then converted, but we can avoid the conversion by using the same extract-the-sign-bit + or-in-1.0 algorithm by directly operating on the high four bytes of each double-precision component in the result. In SIMD8 and SIMD16 this cuts one instruction from the implementation, and more importantly that instruction is the one which violated the regioning restriction. Along the way I removed some comments that I did not think helped, and some code about double comparisons which does not seem to be necessary today. This prevents validation failures caught by the new EU validation code added in later patches. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	b541945c20	i965/fs: Unpack count argument to 64-bit shift ops on Atom 64-bit operations on Atom parts have additional restrictions over their big-core counterparts (validated by later patches). Specifically, the restriction that "Source and Destination horizontal stride must be aligned to the same qword" is violated by most shift operations since NIR uses a 32-bit value as the shift count argument, and this causes instructions like shl(8) g19<1>Q g5<4,4,1>Q g23<4,4,1>UD where src1 has a 32-bit stride, but the dest and src0 have a 64-bit stride. This caused ~4 pixels in the ARB_shader_ballot piglit test fs-readInvocation-uint.shader_test to be incorrect. Unfortunately no ARB_gpu_shader_int64 test hit this case because they operate on uniforms, and their scalar regions are an exception to the restriction. We work around this by effectively unpacking the shift count, so that we can read it with a 64-bit stride in the shift instruction. Unfortunately the unpack (a MOV with a dst stride of 2) is a partial write, and cannot be copy-propagated or CSE'd. Bugzilla: https://bugs.freedesktop.org/101984	2017-10-04 14:08:54 -07:00
Matt Turner	2082c32950	i965/fs: Don't apply POW/FDIV workaround on Gen10+ The documentation says it applies only to Gens 8 and 9. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-04 14:08:37 -07:00
Matt Turner	d407935327	i965: Fix src0 vs src1 typo A typo caused us to copy src0's reg file to src1 rather than reading src1's as intended. This caused us to fail to compact instructions like mov(8) g4<1>D 0D { align1 1Q }; because src1 was set to immediate rather than architecture file. Fixing this reenables compaction (after the precompact() pass changes the data types): mov(8) g4<1>UD 0x00000000UD { align1 1Q compacted }; Fixes: `1cb0a7941b` ("i965: Switch to using the logical register types") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-04 14:08:24 -07:00
Lionel Landwerlin	d3acc240d0	intel: compiler: vec4: add missing default 0 lod We set a similar default value for LOD in the fs backend for TXS/TXL. Without this we end up generating invalid MOV with a null src. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-10-03 22:50:46 +01:00
Dylan Baker	7a5a986ddd	meson: convert gtest to an internal dependency In truth gtest is an external dependency that upstream expects you to "vendor" into your own tree. As such, it makes sense to treat it more like a dependency than an internal library, and collect it's requirements together in a dependency object. v2: - include with -isystem instead of setting compiler args (Eric) Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2017-10-03 10:02:08 -07:00
Iago Toral Quiroga	5e584a9db7	i965: skip reading unused slots at the begining of the URB for the FS We can start reading the URB at the first offset that contains varyings that are actually read in the URB. We still need to make sure that we read at least one varying to honor hardware requirements. This helps alleviate a problem introduced with `99df02ca26` for separate shader objects: without separate shader objects we assign locations sequentially, however, since that commit we have changed the method for SSO so that the VUE slot assigned depends on the number of builtin slots plus the location assigned to the varying. This fixed layout is intended to help SSO programs by avoiding on-the-fly recompiles when swapping out shaders, however, it also means that if a varying uses a large location number close to the maximum allowed by the SF/FS units (31), then the offset introduced by the number of builtin slots can push the location outside the range and trigger an assertion. This problem is affecting at least the following CTS tests for enhanced layouts: KHR-GL45.enhanced_layouts.varying_array_components KHR-GL45.enhanced_layouts.varying_array_locations KHR-GL45.enhanced_layouts.varying_components KHR-GL45.enhanced_layouts.varying_locations which use SSO and the the location layout qualifier to select such location numbers explicitly. This change helps these tests because for SSO we always have to include things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is very unlikely to read them, so by doing this we free builtin slots from the fixed VUE layout and we avoid the tests to crash in this scenario. Of course, this is not a proper fix, we'd still run into problems if someone tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or gl_CullDistancein in the FS, but that would be a much less common bug and we can probably wait to see if anyone actually runs into that situation in a real world scenario before making the decision that more aggresive changes are required to support this without reverting `99df02ca26`. v2: - Add a debug message when we skip clip distances (Ilia) - we also need to account for this when we compute the urb setup for the fragment shader stage, so add a compiler util to compute the first slot that we need to read from the URB instead of replicating the logic in both places. v3: - Make the util more generic so it can account for all unused slots at the beginning of the URB, that will make it more useful (Ken). - Drop the debug message, it was not what Ilia was asking for. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-02 08:27:13 +02:00
Matt Turner	3cfd6ad01c	i965: Normalize types for FBL, FBH, etc Allows the instructions to be compacted. The documentation claims that some of these only accept UD types, even though the type doesn't change the operation performed. Just normalize the types to ensure we get instruction compaction. The only functional changes are for FBL and CBIT (always use UD types) and FBH (always use the same types). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-09-30 20:18:09 -07:00
Iago Toral Quiroga	47e527bd81	i965/fs: force pull model for 64-bit GS inputs Triggering the push model when 64-bit inputs are involved is not easy due to the constrains on the maximum number of registers that we allow for this mode, however, for GS with 'points' primitive type and just a couple of double varyings we can trigger this and it just doesn't work because the implementation is not 64-bit aware at all. For now, let's make sure that we don't attempt this model whith 64-bit inputs and we always fall back to pull model for them. Also, don't enable the VUE handles in the thread payload on the fly when we find an input for which we need the pull model, this is not safe: if we need to resort to the pull model we need to account for that when we setup the thread payload so we compute the first non-payload register properly. If we didn't do that correctly and we enable it on-the-fly here then we will end up VUE handles on the first non-payload register which will probably lead to GPU hangs. Instead, always enable the VUE handles for the pull model so we can safely use them when needed. The GS is going to resort to pull model almost in every situation anyway, so this shouldn't make a significant difference and it makes things easier and safer. v2: Always enable the VUE handles for pull model, this is easier and safer and the GS is going to fallback to pull model almost always anyway (Ken) v3: Only clamp the URB read length if we are over the maximum reserved for push inputs as we were doing in the original code (Ken). v4: No need to clamp the urb read length if invocations > 1 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-09-29 08:18:25 +02:00
Dylan Baker	d1992255bb	meson: Add build Intel "anv" vulkan driver This allows building and installing the Intel "anv" Vulkan driver using meson and ninja, the driver has been tested against the CTS and has seems to pass the same series of tests (they both segfault when the CTS tries to run wayland wsi tests). There are still a mess of TODO, XXX, and FIXME comments in here. Those are mostly for meson bugs I'm trying to fix, or for additional things to implement for other drivers/features. I have configured all intermediate libraries and optional tools to not build by default, meaning they will only be built if they're pulled in as a dependency of a target that will actually be installed) this allows us to avoid massive if chains, while ensuring that only the bits that need to be built are. v2: - enable anv, x11, and wayland by default - add configure option to disable valgrind v3: - fix typo in meson_options (Nicholas) v4: - Remove dead code (Eric) - Remove change to generator that was from v0 (Eric) - replace if chain with loop (Eric) - Fix typos (Eric) - define HAVE_DLOPEN for both libdl and builtin dl cases (Eric) v5: - rebase on util string buffer implementation Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> (v4)	2017-09-27 09:12:19 -07:00
Dylan Baker	848da66222	intel: use a flag instead of setting PYTHONPATH Meson doesn't allow setting environment variables for custom targets, so we either need to not pass this as an environment variable or use a shell script to wrap the invocation. The chosen solution has the advantage of working for both autotools and meson. v2: - put rules back in top scope (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-09-27 09:07:28 -07:00
Kenneth Graunke	a553eb0fdf	i965: Support copy propagating of untyped atomic surface indexes. In the vec4 backend, SHADER_OPCODE_UNTYPED_ATOMIC's src[1] is the surface index. We want to copy propagate so we can use an immediate message descriptor, rather than an indirect send. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-09-26 15:35:14 -07:00
Kenneth Graunke	66342c997f	i965/vec4: Fix swizzles on atomic sources. Atomic operation sources are scalar values, but we were failing to select the .x component of the second operand. For example, atomicCounterCompSwapARB(counter, 5u, 10u) would generate mov(8) vgrf4.x:D, 5D mov(8) vgrf5.x:D, 10D mov(8) vgrf9.x:UD, vgrf4.xyzw:D mov(8) vgrf9.y:UD, vgrf5.xyzw:D which wrongly selects the .y component of vgrf5, so the actual 10u value would get dead code eliminated. The swizzle works for the other source, but both of them ought to be .xxxx. Fixes the compare and swap CTS tests in: KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-09-26 15:35:11 -07:00
Kenneth Graunke	a62fe34098	i965/vec4: Actually handle atomic op intrinsics. Embarassingly, someone enabled the ARB_shader_atomic_counter_ops extension for Gen7+ but never added the intrinsics to the switch statement in the vec4 backend, so they just hit an unreachable() call and died. Fixes: `40dd45d0c6` (i965: Enable ARB_shader_atomic_counter_ops) Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-09-26 15:35:06 -07:00
Timothy Arceri	49e4248a93	i965/nir: export nir_optimize Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2017-09-26 22:37:02 +10:00
Kenneth Graunke	c9fbe772ba	i965: Handle unwritten PSIZ/VIEWPORT/LAYER outputs in vec4 shaders. This can occur if the shader is capturing some of the values from the VUE header for transform feedback, but the shader hasn't written all of them. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-09-21 09:39:27 -07:00
Jason Ekstrand	d496780fb2	intel/eu/validate: Look up types on demand in execution_type() We are looking up the execution type prior to checking how many sources we have. This leads to looking for a type for src1 on MOV instructions which is bogus. On BDW+, the src1 register type overlaps with the 64-bit immediate and causes us problems. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-09-12 15:01:00 -07:00
Matt Turner	dff75c7175	i965: Drop unnecessary conditional Clang doesn't realize that 0 and 1 are the only possibilities, a thinks lots of variables might be uninitialized. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-08-29 15:20:57 -07:00
Topi Pohjolainen	5dd072380a	intel/compiler: Cast reg types explicitly Makes coverity happier. CID: 1416799 Fixes: `c1ac1a3d25` (i965: Add a brw_hw_type_to_reg_type() function) Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-08-28 14:43:39 +03:00
Jason Ekstrand	95f533d922	anv,i965: Move CS shared lowering into anv Right now, OpenGL uses the GLSL lowering for shared variables and anv uses NIR to lower them. For a long time, we've done this weird thing where we do the NIR lowering unconditionally and then add the SLM sizes from the two together. This works because one of them will always be 0 but it's a bit sketchy. Let's just move the NIR-based lowering into anv_pipeline and get rid of the sketch. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-08-24 16:34:29 -07:00
Kenneth Graunke	4ffa9f3635	i965: Stop using wm_prog_data->binding_table.render_target_start. Render target surfaces always start at binding table index 0. This is required for us to use headerless FB writes, which we really want to do. So, we'll never change that. Given that, it's not necessary to look up a wm_prog_data field which we already know contains 0. We can drop the dependency in brw_renderbuffer_surfaces (Gen4-5)...which was already confusingly missing from gen6_renderbuffer_surfaces. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-08-23 11:55:17 -07:00
Kenneth Graunke	274afad4cd	i965: Add a brw_wm_prog_data::has_render_target_reads field. State upload code should use prog_data rather than poking at shader_info directly. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-08-23 11:55:17 -07:00
Matt Turner	d37d9f84ac	i965: Mark functions static Cuts 300 bytes of .text Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-08-21 14:45:44 -07:00
Matt Turner	f30902629c	i965/vec4: Use 'class' src_reg, rather than 'struct' src_reg Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-08-21 14:45:44 -07:00
Matt Turner	a77d5b28ac	i965/vec4: Return float from spill_cost_for_type() Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-08-21 14:45:44 -07:00
Matt Turner	a98b1a8922	i965: Optimize reading the destination type brw_hw_type_to_reg_type() needs to know only whether the file is BRW_IMMEDIATE_VALUE or not, which is not a valid file for the destination. gcc and clang will evaluate __builtin_strcmp() at compile time, so we can use it to pass a constant file for the destination. text data bss dec hex filename 7816214 346248 420496 8582958 82f72e i965_dri.so before 7816070 346248 420496 8582814 82f69e i965_dri.so after Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	91ef949054	i965: Mark brw_hw_type_to_reg_type() as a pure function text data bss dec hex filename 7816886 346248 420496 8583630 82f9ce i965_dri.so before 7816214 346248 420496 8582958 82f72e i965_dri.so after Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	e07fe89035	i965: Hide the register type hardware encodings So we stop mixing them with the logical enum. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	4fab67a441	i965: Stop using hardware register types directly Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	c746f1c888	i965: Add brw_hw_reg_type_to_letters() and use it in brw_disasm.c Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	6a2471b501	i965: Move brw_reg_type_letters() as well And add "to_" to the name for consistency with the other functions in this file. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	1cb0a7941b	i965: Switch to using the logical register types Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	cb2cd462b1	i965: Add functions to abstract access to register types Previously the brw_inst{,_set}_{dst,src0,src1}_reg_type() functions provided access to the hardware encodings for the register types. We often mixed these with the logical BRW_REGISTER_TYPE_* enums (which themselves used to be the hardware format!) with bad results. With that functionality now available with the hw_ versions (see previous commit), we now add functions that take the logical BRW_REGISTER_TYPE_* enums and convert into the hardware format and vice versa. To do the conversion we also have to provide the file. Note the asymmetry between the two functions: the new getter reads the file from the instruction word, and to ensure that is always set the setter writes both the file and the type. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	9fb8323328	i965: Rename brw_inst's functions that access the register type Put hw_ in the name so that it's clear these are the hardware encodings. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	3e379af492	i965: Index brw_hw_reg_type_to_size()'s table by logical type I'll be transitioning everything to use the logical types. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	c1ac1a3d25	i965: Add a brw_hw_type_to_reg_type() function Will be used in later commits. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	dbe7dd13dd	i965: Use a common table to translate logical to hardware types Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	bfcc9aa829	i965: Extract functions dealing with register types to separate file I'm going to encapsulate all of the logic dealing with register types in this file. Rename the parameters for the hardware encodings from type -> hw_type at the same time. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	890f863da0	i965: Reverse file/type arguments to register type functions I think of the initial arguments as "state" and the last as the actual subject. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	92f787ff86	i965: Add support for disassembling 64-bit integer immediates After the last patch converted things into enums, I helpfully got a compiler warning about these missing from the switch statement. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	deae25ce37	i965: Use separate enums for register vs immediate types The hardware encodings often mean different things depending on whether the source is an immediate. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	8815b9677f	i965: Reorder brw_reg_type enum values These vaguely corresponded to the hardware encodings, but that is purely historical at this point. Reorder them so we stop making things "almost work" when mixing enums. The ordering has been closen so that no enum value is the same as a compatible hardware encoding. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	ce6b8627d8	i965: Validate destination restrictions with vector immediates Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	1d79c828d8	i965: Don't let raw-move check be tricked by immediate vector types UB and B type encodings are the same as UV and VF. Noticed when writing the following patch. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	48aa6ecb87	i965: Only change type of 0.0f to VF if destination stride == 1 The destination stride must be equivalent to a dword if VF is used. Also, since the only compaction table entires with "i:vf" have the destination as "r:f" specifically check that the destination is of type float. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	56a676eed2	i965: Remove CONT/BREAK from instruction compaction test These cannot be compacted. A similar mistake was fixed in commit `90eaf01616` Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	3d661e6062	i965: Test instruction compaction on all supported Gens Note that there's no point in testing on G45, since its compaction is the same as Gen5. Same logic applies to Gen7 variants and low-power parts. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	9ff7d9b853	i965: Silence signed/unsigned comparison warning Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	eac89911e5	i965: Move compaction "prepass" into brw_eu_compact.c Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Matt Turner	17641f6388	i965: Mark src inst pointer const in compaction code Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-08-21 14:05:23 -07:00
Iago Toral Quiroga	81615ad444	intel/compiler: properly size attribute wa_flags array for Vulkan Mesa will map user defined vertex input attributes to slots starting at VERT_ATTRIB_GENERIC0 which gives us room for only 16 slots (up to GL_VERT_ATTRIB_MAX). This sufficient for GL, where we expose exactly 16 vertex attributes for user defined inputs, but in Vulkan we can expose up to 28 (which are also mapped from VERT_ATTRIB_GENERIC0 onwards) so we need to account for this when we scope the size of the array of attribute workaround flags that is used during the brw_vertex_workarounds NIR pass. This prevents out-of-bounds accesses in that array for NIR shaders that use more than 16 vertex input attributes. Fixes: dEQP-VK.pipeline.vertex_input.max_attributes.* Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-08-11 10:41:44 +02:00
Dave Airlie	271fa3a684	intel/vec4/gs: reset nr_pull_param if DUAL_INSTANCED compile failed. If dual object compile fails (as seems to happen with virgl a fair bit, and does piglit even have any tests for it?), we end up not restarting the pull params, so we call vec4_visitor::move_uniform_array_access_to_pull_constant a second time and it runs over the ends of the alloc. Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test running inside virgl on ivybridge. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-03 16:54:08 +10:00
Matt Turner	858f554078	i965: Fix indentation	2017-08-02 16:49:32 -07:00
Kenneth Graunke	30d6bc470a	i965: Set lower_vote_trivial in vector_nir_options_gen6 too. There's a second struct for Gen6+. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-21 18:09:01 -07:00
Matt Turner	069bf7c907	i965/fs: Match destination type to size for ballot No use in taking a 64-bit value when we know the high 32-bits are zero.	2017-07-20 16:56:50 -07:00
Matt Turner	1038d385a9	nir: Reduce destination size of ballot intrinsic when possible Some hardware, like i965, doesn't support group sizes greater than 32. In that case, we can reduce the destination size of the ballot intrinsic, which will simplify our code generation. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Matt Turner	782ef30451	i965/fs: Implement ARB_shader_ballot operations Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Matt Turner	8238930510	i965/fs: Do not move MOVs writing the flag outside of control flow The implementation of ballotARB() will start by zeroing the flags register. So, a doing something like if (gl_SubGroupInvocationARB % 2u == 0u) { ... = ballotARB(true); [...] } else { ... = ballotARB(true); [...] } (like fs-ballot-if-else.shader_test does) would generate identical MOVs to the same destination (the flag register!), and we definitely do not want to pull that out of the control flow. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Francisco Jerez	f1b7c47913	i965/fs: Handle explicit flag sources in flags_read() The implementations of the ARB_shader_ballot intrinsics will explicitly read the flag as a source register. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-20 16:56:49 -07:00
Matt Turner	43ef75b394	nir: Add system values from ARB_shader_ballot We already had a channel_num system value, which I'm renaming to subgroup_invocation to match the rest of the new system values. Note that while ballotARB(true) will return zeros in the high 32-bits on systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB variables do not consider whether channels are enabled. See issue (1) of ARB_shader_ballot. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Matt Turner	ee9fa4ac18	i965/fs: Implement ARB_shader_group_vote operations Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Francisco Jerez	93dc736f4e	i965/fs: Handle explicit flag destinations in flags_written() The implementations of the ARB_shader_group_vote intrinsics will explicitly write the flag as the destination register. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-20 16:56:49 -07:00
Matt Turner	30b72f4126	i965/vec4: Lower ARB_shader_group_vote intrinsics I don't expect anyone is going to care about using this in vec4 programs (vertex/tessellation/geometry on Gen6/7), no one has come up with a good way to implement it much less test it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Matt Turner	d4c9d6a3b2	nir: Add pass to optimize intrinsics Specifically, constant fold intrinsics from ARB_shader_group_vote, but I suspect it'll be useful for other things in the future. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Kenneth Graunke	b2da123801	i965: Use pushed UBO data in the scalar backend. This actually takes advantage of the newly pushed UBO data, avoiding pull loads. Improves performance in GLBenchmark Manhattan 3.1 by: HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%. (thanks to Eero Tamminen for these numbers) shader-db results on Skylake, ignoring programs with spill/fill changes: total instructions in shared programs: 13963994 -> 13651893 (-2.24%) instructions in affected programs: 4250328 -> 3938227 (-7.34%) helped: 28527 HURT: 0 total cycles in shared programs: 179808608 -> 172535170 (-4.05%) cycles in affected programs: 79720410 -> 72446972 (-9.12%) helped: 26951 HURT: 1248 LOST: 46 GAINED: 21 Many "Deus Ex: Mankind Divided" shaders which already spilled end up spill a lot more (about 240 programs hurt, 9 helped). The cycle estimator suggests this is still overall a win (-0.23% in cycle counts) presumably because we trade pull loads for fills. v2: Drop "PULL" environment variable left in for initial debugging (caught by Matt). Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 20:18:54 -07:00
Kenneth Graunke	c9ef27e77b	i965: Factor out push locations. With UBOs, the answer of "have we decided to push this uniform" gets a bit more complicated - for one, we have multiple surfaces. This patch refactors things so we can add the new code in a single place. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 20:18:54 -07:00
Kenneth Graunke	4f586cd8f1	i965: Push UBO data, but don't use it just yet. This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets, and updates the compiler to know that there's extra payload data, so things continue working. However, it still issues pull loads for all data. I wanted to separate the two aspects for greater bisectability. v2: Update for new intel_bufferobj_buffer parameter. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 20:18:30 -07:00
Kenneth Graunke	6d28c6e52c	i965: Select ranges of UBO data to be uploaded as push constants. This adds a NIR pass that decides which portions of UBOS we should upload as push constants, rather than pull constants. v2: Switch to uint16_t for the UBO block number, because we may have a lot of them in Vulkan (suggested by Jason). Add more comments about bitfield trickery (requested by Matt). v3: Skip vec4 stages for now...I haven't finished wiring up support in the vec4 backend, and so pushing the data but not using it will just be wasteful. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 19:56:49 -07:00
Kenneth Graunke	8ec5a4e4a4	i965: Switch to absolute addressing for constant buffer 0. By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. I'd like to be able to use all four push buffers. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 19:56:49 -07:00
Lionel Landwerlin	226fae7849	intel/compiler: no need to check unsigned is >= 0 CID: 1338342 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2017-07-13 22:50:45 +01:00
Lionel Landwerlin	95c917668c	intel/compiler: don't check unsigned is >= 0 CID: 1224468 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2017-07-13 22:50:38 +01:00
Lionel Landwerlin	a25a533458	intel/compiler: remove check unsigned is >= 0 By definition unsigned are always >= 0. CID: 742212 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2017-07-13 22:50:29 +01:00
Anuj Phogat	0a56c5f3f1	intel/compiler: Don't use opt_sampler_eot() optimization on gen10+ This optimization has been removed on gen10+. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-07-12 11:27:31 -07:00
Johnson Lin	165e704719	i965/i915: Add UYVY as the supported format Trigger the correct sampler options for it. Similar with YUYV Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2017-06-30 10:16:26 +01:00
Lionel Landwerlin	030abc6109	intel: compiler/i965: fix is_broxton checks In `5f2fe9302c` is_geminilake was introduced for the differenciate broxton from geminilake. Unfortunately I failed as verifying that is_broxton is throughout the code base to mean Gen9lp. Fixes: `5f2fe9302c` ("intel: common: add flag to identify platforms by name") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-06-20 23:26:42 +01:00
Anuj Phogat	f9e31a26d4	i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3 v1: By Ben Widawsky <benjamin.widawsky@intel.com> v2: v1 had an assert only for VS. Add the restriction for GS, HS and DS as well and make sure the allocated sizes are not multiple of 3. v3: Move the entry_size checks in to compiler code (Ken) Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-06-09 16:02:59 -07:00
Anuj Phogat	111881abac	i965/cnl: Handle gen10 in switch cases across the driver V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec() gen10_init_atoms() (Jason) Remove Vulkan changes. Do them later in a separate patch. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-06-09 16:02:59 -07:00
Anuj Phogat	30e749c8f1	i965/cnl: Update few assertions Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-06-09 16:02:59 -07:00
Eric Engestrom	63a8a88ac4	tree-wide: remove trailing backslash Simple search for a backslash followed by two newlines. If one of the newlines were to be removed, this would cause issues, so let's just remove these trailing backslashes. Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-06-07 01:18:09 +01:00
Kenneth Graunke	9cd69022d5	i965: Change INTEL_DEBUG=vec4 to INTEL_SCALAR_VS for consistency. We moved to INTEL_SCALAR_* when we added more than a single stage, but never went back and converted the VS to work that way. Be consistent. Also update the documentation to actually mention these debug variables. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2017-06-05 23:32:40 -07:00
Kenneth Graunke	fe14a9a501	i965: Drop duplicate shadow variable. We already initialized this at the top of the function. Trivial.	2017-06-01 14:28:12 -07:00
Kenneth Graunke	65f5f3c85c	i965: Move SOL PSIZ hacks from draw time to link time. We can just update the gl_transform_feedback_info fields at link time to make the VUE header fields have the right location and component. Then we don't need to handle them specially at draw time, which is expensive. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-06-01 00:08:29 -07:00
Kenneth Graunke	1e3880544e	i965: Ignore INTEL_SCALAR_* debug variables on Gen10+. Scalar mode has been default since Broadwell, and vector mode is getting increasingly unmaintained. There are a few things that don't even fully work in vector mode on Skylake, but we've never cared because nobody uses it. There's no point in porting it forward to new platforms. So, just ignore the debug options to force it on. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-29 21:40:44 -07:00
Jason Ekstrand	18e18a1863	i965: Move clip program compilation to the compiler Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	9fb8a8775b	i965: Move SF compilation to the compiler Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	21ba2b4bef	intel/compiler: Make brw_disasm take const assembly Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Samuel Iglesias Gonsálvez	e69e5c7006	i965/vec4: load dvec3/4 uniforms first in the push constant buffer Reorder the uniforms to load first the dvec4-aligned variables in the push constant buffer and then push the vec4-aligned ones. It takes into account that the relocated uniforms should be aligned to their channel size. This fixes a bug were the dvec3/4 might be loaded one part on a GRF and the rest in next GRF, so the region parameters to read that could break the HW rules. v2: - Fix broken logic. - Add a comment to explain what should be needed to optimise the usage of the push constant buffer slots, as this patch does not pack the uniforms. v3: - Implemented the push constant buffer usage optimization. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez	8aa6ada838	i965/vec4: fix swizzle and writemask when loading an uniform with constant offset It was setting XYWZ swizzle and writemask to all uniforms, no matter if they were a vector or scalar, so this can lead to problems when loading them to the push constant buffer. Moreover, 'shift' calculation was designed to calculate the offset in DWORDS, but it doesn't take into account DFs, so the calculated swizzle for the later ones was wrong. The indirect case is not changed because MOV INDIRECT will write to all components. Added an assert to verify that these uniforms are aligned. v2: - Fix 'shift' calculation (Curro) - Set both swizzle and writemask. - Add assert(shift == 0) for the indirect case. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez	354f7f2cb9	i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution We are going to add a packing feature to reduce the usage of the push constant buffer. One of the consequences is that 'nr_params' would be modified by vec4_visitor's run call, so we need to restore it if one of them failed before executing the fallback ones. Same thing happens to the uniforms values that would be reordered afterwards. Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when the dvec4 alignment and packing patch is applied. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:28 +02:00
Matt Turner	169e1e26ee	i965: Fix test_eu_validate.cpp Broken by commit `a7217e909c` ("i965: Pass pointer and end of assembly to brw_validate_instructions"). Reported-by: Aaron Watry <awatry@gmail.com>	2017-05-16 11:45:07 -07:00
Matt Turner	aae2626be8	i965: Add a weak no-op nir_print_instr() symbol intel_asm_annotation.c is part of libintel_compiler.la, which contains code for disassembling and validating shaders that we want to call in aubinator_error_decode. dump_assembly() calls nir_print_instr() to print annotations, and although dump_assembly() is not called by aubinator_error_decode (nor is any function in intel_asm_annotation.c) it causes undefined references to nir_print_instr(). To work around, provide a no-op weak symbol to resolve against.	2017-05-15 11:43:01 -07:00
Matt Turner	d98e82c772	i965: Allow brw_eu_validate to handle compact instructions This will allow the validator to run on shader programs we find in the GPU hang error state. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-15 11:42:56 -07:00
Matt Turner	a7217e909c	i965: Pass pointer and end of assembly to brw_validate_instructions This will allow us to more easily run brw_validate_instructions() on shader programs we find in GPU hang error states. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-15 11:42:47 -07:00
Jason Ekstrand	037ce253b1	i965/vec4: Delete the system value infastructure The only thing still using it is INVOCATION_ID for geometry shaders. That's easily enough inlined into the nir_intrinsic_load_invocation_id handling code. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:07 -07:00
Jason Ekstrand	2e9916ea04	i965/vec4: Use NIR to do GS input remapping We're already doing this in the FS back-end. This just does the same thing in the vec4 back-end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:07 -07:00
Jason Ekstrand	e31042ab40	i965/fs: Move remapping of gl_PointSize to the NIR level Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	5b00c3cc05	i965/nir: Inline remap_inputs_with_vue_map Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	0d5f89cdc3	i965/vec4: Use NIR remapping for VS attributes The NIR pass already handles remapping system values to attributes for us so we delete the system value code as part of the conversion. We also change nir_lower_vs_inputs to take an explicit inputs_read bitmask and pass in the inputs_read from prog_data instead from pulling it out of NIR. This is because the version in prog_data may get EDGEFLAG added to it on some old platforms. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	80aa6e9d32	intel/compiler/vs: Move inputs_read handling to generic code Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:03 -07:00
Jason Ekstrand	d2fe804d18	i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE map We also add a nice little comment to make it more clear exactly what happens with the edge flag copy. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	ca4d192802	i965/fs: Lower gl_VertexID and friends to inputs at the NIR level NIR calls these system values but they come in from the VF unit as vertex data. It's terribly convenient to just be able to treat them as such in the back-end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	24e6fba500	i965/vs: Set uses_vertexid and friends from brw_compile_vs Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	5e832302dc	i965: Move multiply by 4 for VS ATTR setup into the scalar backend. The vec4 backend will want to count in units of vec4s, not scalar components. The simplest solution is to move the multiplication by 4 into the scalar backend. This also improves consistency with how we count varyings. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	36764b6923	i965/nir: Inline remap_vs_attrs Now that we have nice block iterators, there's no good reason for this to be off on it's own. While we're here, we convert to using the NIR const index getters/setters instead of whacking const_index values directly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	b86dba8a0e	nir: Embed the shader_info in the nir_shader again Commit `e1af20f18a` changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since `00620782c9`, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Lionel Landwerlin	32f14332f5	intel: compiler: prevent integer overflow CID: 1399477, 1399478 (Integer handling issues) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-05-09 13:56:17 +01:00
Lionel Landwerlin	85182e490c	intel: compiler: remove duplicated code CID: 1399470: (Control flow issues) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-05-09 13:56:17 +01:00
Rafael Antognolli	8fa8abef4b	i965: Move enums to brw_compiler.h. These enums live inside struct brw_wm_prog_data, so it makes sense to keep them in the same header. It also allows to use them without including brw_eu_defines.h. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:55:58 -07:00
Samuel Iglesias Gonsálvez	f57e234fdd	i965/vec4: don't modify regioning parameters to the sources of DF align1 instructions The regioning parameters are now properly set by convert_to_hw_regs() and we don't need to fix them in the generator. That latter fix previously done in the generator was strictly speaking wrong for any non-identity regions. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez	aaeb1c99be	i965/vec4: fix register width for DF VGRF and UNIFORM On gen7, the swizzles used in DF align16 instructions works for element size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that in the rest of the code and prepare the instructions for this (scalarize_df()), we need to set it to two again. However, for DF align1 instructions, a width of 2 is wrong as we are not reading the data we want. For example, an uniform would have a region of <0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access to the first 4. This patch sets the default one to 4 and then modifies the width of align16 instruction's DF sources when we translate the logical swizzle to the physical one. v2: - Remove conditional (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez	7f728bce81	i965/vec4: fix vertical stride to avoid breaking region parameter rule From IVB PRM, vol4, part3, "General Restrictions on Regioning Parameters": "If ExecSize = Width and HorzStride ≠ 0, VertStride must be set to Width * HorzStride." In next patch, we are going to modify the region parameter for uniforms and vgrf. For uniforms that are the source of DF align1 instructions, they will have <0, 4, 1> regioning and the execsize for those instructions will be 4, so they will break the regioning rule. This will be the same for VGRF sources where we use the vstride == 0 exploit. As we know we are not going to cross the GRF boundary with that execsize and parameters (not even with the exploit), we just fix the vstride here. v2: - Move is_align1_df() (Curro) - Refactor exec_size == width calculation (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Francisco Jerez	58324389be	intel/fs: Take into account amount of data read in spilling cost heuristic. Until now the spilling cost calculation was neglecting the amount of data read from the register during the spilling cost calculation. This caused it to make suboptimal decisions in some cases leading to higher memory bandwidth usage than necessary. Improves Unigine Heaven performance by ~4% on BDW, reversing an unintended FPS regression from my previous commit `147e71242c` with n=12 and statistical significance 5%. In addition SynMark2 OglCSDof performance is improved by an additional ~5% on SKL, and a Kerbal Space Program apitrace around the Moho planet I can provide on request improves by ~20%. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-24 11:01:40 -07:00
Francisco Jerez	ecc19e12dc	intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy. This is what we use later on to compute the number of registers that will actually get spilled to memory, so it's more likely to match reality than the current open-coded approximation. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-24 10:59:56 -07:00
Kenneth Graunke	6b10c37b9c	i965/vec4: Use reads_accumulator_implicitly(), not MACH checks. Curro pointed out that I should not just check for MACH, but use the reads_accumulator_implicitly() helper, which would also prevent the same bug with MAC and SADA2 (if we ever decide to use them). Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-24 10:53:49 -07:00
Timothy Arceri	7a7ee40c2d	nir/i965: add before ffma algebraic opts This shuffles constants down in the reverse of what the previous patch does and applies some simpilifications that may be made possible from doing so. Shader-db results BDW: total instructions in shared programs: 12980814 -> 12977822 (-0.02%) instructions in affected programs: 281889 -> 278897 (-1.06%) helped: 1231 HURT: 128 total cycles in shared programs: 246562852 -> 246567288 (0.00%) cycles in affected programs: 11271524 -> 11275960 (0.04%) helped: 1630 HURT: 1378 V2: mark float opts as inexact Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 12:08:14 +10:00
Kenneth Graunke	2faf227ec2	i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce(). opt_register_coalesce() was optimizing sequences such as: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D mov(8) m4.zw:F, vgrf5.xxxy:F into: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D This doesn't work - if we're going to reswizzle MACH, we'd need to reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw components with attr18.yy * attr19.yy. But the MACH instruction expects .z to contain attr18.x * attr19.x. Bogus results ensue. No change in shader-db on Haswell. Prevents regressions in Timothy's patches to use enhanced layouts for varying packing (which rearrange code just enough to trigger this pre-existing bug, but were fine themselves). Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-22 00:01:16 -07:00
Matt Turner	2eeb1b0ad9	i965: Use correct VertStride on align16 instructions. In commit `c35fa7a`, we changed the "width" of DF source registers to 2, which is conceptually fine. Unfortunately a VertStride of 2 is not allowed by align16 instructions on IVB/BYT, and the regular VertStride of 4 works fine in any case. See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test for example: cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed v2: - Add spec quote (Curro). - Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro) Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	d8441e2276	i965/vec4/dce: improve track of partial flag register writes This is required for correctness in presence of multiple 4-wide flag writes (e.g. 4-wide instructions with a conditional mod set) which update a different portion of the same 8-bit flag subregister. Right now we keep track of flag dataflow with 8-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 8 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	c1fc8fad47	i965/vec4: don't do horizontal stride on some register file types horiz_offset() shouldn't be doing anything for scalar registers, because all channels of any SIMD instructions will end up reading or writing the same component of the register, so shifting the register offset would be wrong. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Re-implement in terms of is_uniform() for simplicity. Pass argument by const reference. Clarify commit message. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Matt Turner	21e8e3a848	i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT. Otherwise for a pack_double_2x32_split opcode, we emit: vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted }; mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted }; mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same The intention was to emit mov(4)s for the instructions that have ERROR annotations. See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test for example. v2 (Samuel): - Instead of setting the exec size to a fixed value, don't double it (Curro). - Add PICK_{HIGH,LOW}_32BIT to the condition. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Trivial rebase changes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	f030aaf2fb	i965/vec4: use vec4_builder to emit instructions in setup_imm_df() Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to static stand-alone function. Don't write unused components in the result. Use vec4_builder interface for register allocation. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Juan A. Suarez Romero	a907c91e93	i965/vec4: consider subregister offset in live variables Take into account offset values less than a full register (32 bytes) when getting the var from register. This is required when dealing with an operation that writes half of the register (like one d2x in IVB/BYT, which uses exec_size == 4). v2: - Take in account this offset < 32 in liveness analysis too (Curro) v3: - Change formula in var_from_reg() (Curro) - Remove useless changes (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	92649a3e67	i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB On IVB, DF instructions have lowered the SIMD width to 4 but the exec_size will be later doubled. Fix the assert to avoid crashing in this case. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4 == 0' part the assertion was redundant with the previous assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	6e3265eae5	i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type This way we can set the destination type as double to all these new opcodes, avoiding any optimizer's confusion that was happening before. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop no_spill workaround originally needed due to the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	50a5217637	i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones When doing a 64-bit to a smaller data type size conversion, the destination should be aligned to 64-bits. Because of that, we need to gather the data after the actual conversion. Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but now we split them explicitely in two different instructions: VEC4_OPCODE_FROM_DOUBLE just do the conversion and VEC4_OPCODE_PICK_LOW_32BIT will gather the data. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	cfaf14a126	i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT In the generator we must generate slightly different code for Ivybridge/Baytrail, because of the way the stride works in this hardware. v2: - Use stride and don't need to fix dst (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	be445d3ea3	i965/vec4: keep original type when dealing with null registers Keep the original type when dealing with null registers. Especially because we do no want to introduce an implicit conversion between types that could affect the conditional flags. This affects especially when the original type is DF, and we are working on Ivybridge/Baytrail. v2 (Curro) - Fix typo. - Use retype() instead of applying the type directly. - Remove unneeded retype. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a21dc2b500	i965/vec4: split DF instructions and later double its execsize in IVB/BYT We need to split DF instructions in two on IVB/BYT as it needs an execsize 8 to process 4 DF values (one GRF in total). v2: - Rename helper and make it static inline function (Matt). - Fix indention and add braces (Matt). v3: - Don't edit IR instruction when doubling exec_size (Curro) - Add comment into the code (Curro). - Manage ARF registers like the others (Curro) v4: - Add get_exec_type() function and use it to calculate the execution size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Clarify comment in get_lowered_simd_width(). Move SIMD width workaround outside of 'if (...inst->size_written > REG_SIZE)' conditional block, since the problem should be independent of whether the amount of data written by the instruction is greater or lower than a GRF. Drop redundant is_ivb_df definition. Drop bogus inst->exec_size < 8 check. Simplify channel group assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a5399e8b1c	i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT The hardware applies the same channel enable signals to both halves of the compressed instruction which will be just wrong under non-uniform control flow. Fix this by splitting those instructions to SIMD4. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	ebfb703d44	i965/fs: Get 64-bit indirect moves working on IVB. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Matt Turner	630b84cdc8	i965: Use source region <1,2,0> when converting to DF. Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead of two. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	3198ce3f96	i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT According to the IVB and HSW PRMs: "2.When the destination requires two registers and the sources are indirect, the sources must use 1x1 regioning mode." So for DF instructions the execution size is not limited by the number of address registers that are available, but by the EU decompression logic not handling VxH indirect addressing correctly. This patch limits the SIMD width to 4 in this case. v2: - Fix typo (Matt). - Fix condition (Curro) v3: - Add spec quote (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	571cbd05eb	i965/fs: fix dst stride in IVB/BYT type conversions When converting a DF to 32-bit conversions, we set dst stride to 2, to fulfill alignment restrictions because the upper Dword of every Qword will be written with undefined value. But in IVB/BYT, this is not necessary, as each DF conversion already writes 2, the first one the real value, and the second one a 0. That is, IVB/BYT already set stride = 2 implicitly, so we must set it to 1 explicitly to avoid ending up with stride = 4. v2: - Fix typo (Matt) v3: - Fix stride in the destination's brw_reg, don't modity IR (Curro) v4: - Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro) - Fix comment (Curro). - Relax hstride assert (Curro) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Minor spelling fixes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	af6fc3a8ea	i965/fs: rename lower_d2x to lower_conversions v2: - Change the name to lower_conversions. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	dee31311eb	Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs." This reverts commit `7dccd38b40`. d2x pass fixes SEL instructions when there is a type conversion by doing a SEL without type conversion and then convert the result. This pass also takes into account the non-uniform control flow. Then, `7dccd38b40` is not needed anymore. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	aeecc82d05	i965/fs: generalize the legalization d2x pass Generalize it to lower any unsupported narrower conversion. v2 (Curro): - Add supports_type_conversion() - Reuse existing intruction instead of cloning it. - Generalize d2x to narrower and equal size conversions. v3 (Curro): - Make supports_type_conversion() const and improve it. - Use foreach_block_and_inst to process added instructions. - Simplify code. - Add assert and improve comments. - Remove redundant mov. - Remove useless comment. - Remove saturate == false assert and add support for saturation when fixing the conversion. - Add get_exec_type() function. v4 (Curro): - Use get_exec_type() function to get sources' type. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Matt Turner	94ffeb7fa2	i965: Use <0,2,1> region for scalar DF sources on IVB/BYT. On HSW+, scalar DF sources can be accessed using the normal <0,1,0> region, but on IVB and BYT DF regions must be programmed in terms of floats. A <0,2,1> region accomplishes this. v2: - Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro). v3: - Added comment explaining the reason (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	82d17615f4	i965/fs: clamp exec_size when an instruction has a scalar DF source Then the SIMD lowering pass will get rid of any compressed instructions with scalar source (whether force_writemask_all or not) and we avoid hitting the Gen7 region decompression bug. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Suggested-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	0f1316d4db	i965/fs: double regioning parameters and execsize for DF in IVB/BYT In IVB and BYT, both regioning parameters and execution sizes are measured as 32-bits element size. So when we have something like: mov(8) g2<1>DF g3<4,4,1>DF We are not actually moving 8 doubles (our intention), but 4 doubles. We need to double the parameters to cope with this issue. However, horizontal strides don't behave as they're supposed to on IVB for DF regions, they will cause each 32-bit half of DF sources to be strided individually, and doubling the value won't make any difference. v2: - Use devinfo directly (Matt). - Use Baytrail instead of Valleview (Matt). - Use IvyBridge instead of Ivy (Matt) - Double the exec_size in code emission (Curro) v3: - Change hstride doubling by an assert and fix commit log (Curro). - Substitute remaining compiler->devinfo by devinfo (Curro). v4: - Fix comment (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	79af256388	i965/fs: add helper to retrieve instruction execution type The execution data size is the biggest type size of any instruction operand. We will use it to know if the instruction deals with DF, because in Ivy we need to double the execution size and regioning parameters. v2: - Fix typo in commit log (Matt) - Use static inline function instead of fs_inst's method (Curro). - Define the result as a constant (Curro). - Fix indentation (Matt). - Add braces to nested control flow (Matt). v3 (Curro): - Add get_exec_type() and other auxiliary functions and use them to calculate its size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced execution type for integer vector types. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Move into brw_ir_fs.h header for consistency with the VEC4 back-end. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Matt Turner	fd349d29e4	i965: Handle IVB DF differences in the validator. On IVB/BYT, region parameters and execution size for DF are in terms of 32-bit elements, so they are doubled. For evaluating the validity of an instruction, we halve them. v2 (Sam): - Add comments. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:07 -07:00
Iago Toral Quiroga	fbac8b1f94	i965/disasm: also print nibctrl in IVB for execsize=8 4-wide DF operations where NibCtrl applies require and execsize of 8 in IvyBridge/BayTrail. v2: - Refactor NibCtrl printing (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:06 -07:00
Francisco Jerez	147e71242c	i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic. The individual branches of an if/else/endif construct will be executed some unknown number of times between 0 and 1 relative to the parent block. Use some factor in between as weight while approximating the cost of spill/fill instructions within a conditional if-else branch. This favors spilling registers used within conditional branches which are likely to be executed less frequently than registers used at the top level. Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on my SKL GT4e. Should have a comparable effect on other platforms. No significant regressions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-11 15:28:54 -07:00
Jason Ekstrand	3503b2714b	i965/fs: Always provide a default LOD of 0 for TXS and TXL We already provide a default LOD for textureQueryLevels and texture() on non-fragment stages. However, there are more cases where one is needed such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list out all of the cases one at a time, just provide the default for all TXS and TXL operations. This fixes a shader validation error in the new Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391 Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-04 18:33:35 -07:00
Jason Ekstrand	405ef7bb33	intel/vec4: Add some fall through comments Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-03 16:58:35 -07:00
Alejandro Piñeiro	2f8d6bd578	i965: expose BRW_OPCODE_[F32TO16/F16TO32] name on gen8+ Technically those hw operations are only available on gen7, as gen8+ support the conversion on the MOV. But, when using the builder to implement nir operations (example: nir_op_fquantize2f16), it is not needed to do the gen check. This check is done later, on the final emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the specific operation accordingly. So in the middle, during optimization phases those hw operations can be around for gen8+ too. Without this patch, several (at least 95) vulkan-cts quantize tests crashes when using INTEL_DEBUG=optimizer. For example: dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert v2: simplify the code using GEN_GE (Ilia Mirkin) v3: tweak brw_instruction_name instead of changing opcode_descs table, that is used for validation (Matt Turner) Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-03-29 17:34:15 +02:00
Matt Turner	7dccd38b40	i965/fs: Don't emit SEL instructions for type-converting MOVs. SEL can only convert between a few integer types, which we basically never do. Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-03-27 10:59:42 -07:00
Iago Toral Quiroga	ddb2bb3ed4	anv/pipeline: make FragCoord include sample positions when sample shading We need to know if sample shading has been requested during shader compilation since that affects the way fragment coordinates are computed. Notice that the semantics of fragment coordinates only depend on whether sample shading has been requested, not on whether more than one sample will actually be produced (that is, minSampleShading and rasterizationSamples do not affect this behavior). Because this setting affects the code we generate for the shader, we also need to include it in the WM prog key. Notice we don't need to alter the OpenGL code because it doesn't ever use this behavior, so they key's value is always false (the default). Fixes: dEQP-VK.glsl.builtin_var.fragcoord_msaa.* Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-24 08:11:53 +01:00
Matt Turner	7499bc7fd7	i965: Replace OPT_V() with OPT(). We want to be able to check the progress of each pass and dump the NIR for debugging purposes if it changed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Matt Turner	1be91bd9d8	i965/fs: Return progress from demote_sample_qualifiers(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Matt Turner	fd3351246c	i965/fs: Return progress from move_interpolation_to_top(). And mark as static at the same time. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Emil Velikov	2438c0a236	intel/compiler: consistently use ifndef guards over pragma once Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Vedran Miletić <vedran@miletic.net> Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-03-22 16:55:22 +00:00
Emil Velikov	3b277bae66	i965: make brw_setup_image_uniform_values static Used only internally. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Vedran Miletić <vedran@miletic.net> Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-03-22 16:55:21 +00:00
Jason Ekstrand	762a6333f2	nir: Rework conversion opcodes The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	7107b32155	i965/fs: Re-arrange conversion operations Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	bab4610e9c	i965/vec4: Get rid of the type parameter from to/from_double Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	b377be9213	i965/fs: Use num_components from the SSA def in image intrinsics Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2017-03-14 07:36:40 -07:00
Iago Toral Quiroga	e8eeb759b7	intel: fix compiler build compiler/brw_vec4_gs_visitor.cpp:744:39: error: ‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES); Fixes: `d0d4a5f43b` ("i965: split EU defines to brw_eu_defines.h") Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-13 13:09:24 +01:00
Emil Velikov	aa09c9552c	intel/compiler: whitespace cleanups Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:35 +00:00
Emil Velikov	bdc5036464	intel/compiler: link all tests again gtest, even test_eu_compact" At the moment all the tests but test_eu_compact are actual C++ gtests. To simplify things, we can move the gtest.la to the common TEST_LIBS. As we're here, we can rename change the test extension [to .cpp] to avoid using the confusing dummy.cpp. Add a nice comment in the makefile for posterity. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:35 +00:00
Jason Ekstrand	700bebb958	i965: Move the back-end compiler to src/intel/compiler Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:34 +00:00

... 6 7 8 9 10 ...

656 Commits