KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Matt Turner	1038d385a9	nir: Reduce destination size of ballot intrinsic when possible Some hardware, like i965, doesn't support group sizes greater than 32. In that case, we can reduce the destination size of the ballot intrinsic, which will simplify our code generation. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Matt Turner	782ef30451	i965/fs: Implement ARB_shader_ballot operations Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Matt Turner	8238930510	i965/fs: Do not move MOVs writing the flag outside of control flow The implementation of ballotARB() will start by zeroing the flags register. So, a doing something like if (gl_SubGroupInvocationARB % 2u == 0u) { ... = ballotARB(true); [...] } else { ... = ballotARB(true); [...] } (like fs-ballot-if-else.shader_test does) would generate identical MOVs to the same destination (the flag register!), and we definitely do not want to pull that out of the control flow. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Francisco Jerez	f1b7c47913	i965/fs: Handle explicit flag sources in flags_read() The implementations of the ARB_shader_ballot intrinsics will explicitly read the flag as a source register. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-20 16:56:49 -07:00
Matt Turner	43ef75b394	nir: Add system values from ARB_shader_ballot We already had a channel_num system value, which I'm renaming to subgroup_invocation to match the rest of the new system values. Note that while ballotARB(true) will return zeros in the high 32-bits on systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB variables do not consider whether channels are enabled. See issue (1) of ARB_shader_ballot. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Matt Turner	ee9fa4ac18	i965/fs: Implement ARB_shader_group_vote operations Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Francisco Jerez	93dc736f4e	i965/fs: Handle explicit flag destinations in flags_written() The implementations of the ARB_shader_group_vote intrinsics will explicitly write the flag as the destination register. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-20 16:56:49 -07:00
Matt Turner	30b72f4126	i965/vec4: Lower ARB_shader_group_vote intrinsics I don't expect anyone is going to care about using this in vec4 programs (vertex/tessellation/geometry on Gen6/7), no one has come up with a good way to implement it much less test it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Matt Turner	d4c9d6a3b2	nir: Add pass to optimize intrinsics Specifically, constant fold intrinsics from ARB_shader_group_vote, but I suspect it'll be useful for other things in the future. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-20 16:56:49 -07:00
Kenneth Graunke	b2da123801	i965: Use pushed UBO data in the scalar backend. This actually takes advantage of the newly pushed UBO data, avoiding pull loads. Improves performance in GLBenchmark Manhattan 3.1 by: HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%. (thanks to Eero Tamminen for these numbers) shader-db results on Skylake, ignoring programs with spill/fill changes: total instructions in shared programs: 13963994 -> 13651893 (-2.24%) instructions in affected programs: 4250328 -> 3938227 (-7.34%) helped: 28527 HURT: 0 total cycles in shared programs: 179808608 -> 172535170 (-4.05%) cycles in affected programs: 79720410 -> 72446972 (-9.12%) helped: 26951 HURT: 1248 LOST: 46 GAINED: 21 Many "Deus Ex: Mankind Divided" shaders which already spilled end up spill a lot more (about 240 programs hurt, 9 helped). The cycle estimator suggests this is still overall a win (-0.23% in cycle counts) presumably because we trade pull loads for fills. v2: Drop "PULL" environment variable left in for initial debugging (caught by Matt). Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 20:18:54 -07:00
Kenneth Graunke	c9ef27e77b	i965: Factor out push locations. With UBOs, the answer of "have we decided to push this uniform" gets a bit more complicated - for one, we have multiple surfaces. This patch refactors things so we can add the new code in a single place. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 20:18:54 -07:00
Kenneth Graunke	4f586cd8f1	i965: Push UBO data, but don't use it just yet. This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets, and updates the compiler to know that there's extra payload data, so things continue working. However, it still issues pull loads for all data. I wanted to separate the two aspects for greater bisectability. v2: Update for new intel_bufferobj_buffer parameter. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 20:18:30 -07:00
Kenneth Graunke	6d28c6e52c	i965: Select ranges of UBO data to be uploaded as push constants. This adds a NIR pass that decides which portions of UBOS we should upload as push constants, rather than pull constants. v2: Switch to uint16_t for the UBO block number, because we may have a lot of them in Vulkan (suggested by Jason). Add more comments about bitfield trickery (requested by Matt). v3: Skip vec4 stages for now...I haven't finished wiring up support in the vec4 backend, and so pushing the data but not using it will just be wasteful. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 19:56:49 -07:00
Kenneth Graunke	8ec5a4e4a4	i965: Switch to absolute addressing for constant buffer 0. By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. I'd like to be able to use all four push buffers. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-07-13 19:56:49 -07:00
Lionel Landwerlin	226fae7849	intel/compiler: no need to check unsigned is >= 0 CID: 1338342 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2017-07-13 22:50:45 +01:00
Lionel Landwerlin	95c917668c	intel/compiler: don't check unsigned is >= 0 CID: 1224468 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2017-07-13 22:50:38 +01:00
Lionel Landwerlin	a25a533458	intel/compiler: remove check unsigned is >= 0 By definition unsigned are always >= 0. CID: 742212 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2017-07-13 22:50:29 +01:00
Anuj Phogat	0a56c5f3f1	intel/compiler: Don't use opt_sampler_eot() optimization on gen10+ This optimization has been removed on gen10+. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-07-12 11:27:31 -07:00
Johnson Lin	165e704719	i965/i915: Add UYVY as the supported format Trigger the correct sampler options for it. Similar with YUYV Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2017-06-30 10:16:26 +01:00
Lionel Landwerlin	030abc6109	intel: compiler/i965: fix is_broxton checks In `5f2fe9302c` is_geminilake was introduced for the differenciate broxton from geminilake. Unfortunately I failed as verifying that is_broxton is throughout the code base to mean Gen9lp. Fixes: `5f2fe9302c` ("intel: common: add flag to identify platforms by name") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-06-20 23:26:42 +01:00
Anuj Phogat	f9e31a26d4	i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3 v1: By Ben Widawsky <benjamin.widawsky@intel.com> v2: v1 had an assert only for VS. Add the restriction for GS, HS and DS as well and make sure the allocated sizes are not multiple of 3. v3: Move the entry_size checks in to compiler code (Ken) Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-06-09 16:02:59 -07:00
Anuj Phogat	111881abac	i965/cnl: Handle gen10 in switch cases across the driver V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec() gen10_init_atoms() (Jason) Remove Vulkan changes. Do them later in a separate patch. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-06-09 16:02:59 -07:00
Anuj Phogat	30e749c8f1	i965/cnl: Update few assertions Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-06-09 16:02:59 -07:00
Eric Engestrom	63a8a88ac4	tree-wide: remove trailing backslash Simple search for a backslash followed by two newlines. If one of the newlines were to be removed, this would cause issues, so let's just remove these trailing backslashes. Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-06-07 01:18:09 +01:00
Kenneth Graunke	9cd69022d5	i965: Change INTEL_DEBUG=vec4 to INTEL_SCALAR_VS for consistency. We moved to INTEL_SCALAR_* when we added more than a single stage, but never went back and converted the VS to work that way. Be consistent. Also update the documentation to actually mention these debug variables. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2017-06-05 23:32:40 -07:00
Kenneth Graunke	fe14a9a501	i965: Drop duplicate shadow variable. We already initialized this at the top of the function. Trivial.	2017-06-01 14:28:12 -07:00
Kenneth Graunke	65f5f3c85c	i965: Move SOL PSIZ hacks from draw time to link time. We can just update the gl_transform_feedback_info fields at link time to make the VUE header fields have the right location and component. Then we don't need to handle them specially at draw time, which is expensive. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-06-01 00:08:29 -07:00
Kenneth Graunke	1e3880544e	i965: Ignore INTEL_SCALAR_* debug variables on Gen10+. Scalar mode has been default since Broadwell, and vector mode is getting increasingly unmaintained. There are a few things that don't even fully work in vector mode on Skylake, but we've never cared because nobody uses it. There's no point in porting it forward to new platforms. So, just ignore the debug options to force it on. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-29 21:40:44 -07:00
Jason Ekstrand	18e18a1863	i965: Move clip program compilation to the compiler Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	9fb8a8775b	i965: Move SF compilation to the compiler Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	21ba2b4bef	intel/compiler: Make brw_disasm take const assembly Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Samuel Iglesias Gonsálvez	e69e5c7006	i965/vec4: load dvec3/4 uniforms first in the push constant buffer Reorder the uniforms to load first the dvec4-aligned variables in the push constant buffer and then push the vec4-aligned ones. It takes into account that the relocated uniforms should be aligned to their channel size. This fixes a bug were the dvec3/4 might be loaded one part on a GRF and the rest in next GRF, so the region parameters to read that could break the HW rules. v2: - Fix broken logic. - Add a comment to explain what should be needed to optimise the usage of the push constant buffer slots, as this patch does not pack the uniforms. v3: - Implemented the push constant buffer usage optimization. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez	8aa6ada838	i965/vec4: fix swizzle and writemask when loading an uniform with constant offset It was setting XYWZ swizzle and writemask to all uniforms, no matter if they were a vector or scalar, so this can lead to problems when loading them to the push constant buffer. Moreover, 'shift' calculation was designed to calculate the offset in DWORDS, but it doesn't take into account DFs, so the calculated swizzle for the later ones was wrong. The indirect case is not changed because MOV INDIRECT will write to all components. Added an assert to verify that these uniforms are aligned. v2: - Fix 'shift' calculation (Curro) - Set both swizzle and writemask. - Add assert(shift == 0) for the indirect case. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez	354f7f2cb9	i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution We are going to add a packing feature to reduce the usage of the push constant buffer. One of the consequences is that 'nr_params' would be modified by vec4_visitor's run call, so we need to restore it if one of them failed before executing the fallback ones. Same thing happens to the uniforms values that would be reordered afterwards. Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when the dvec4 alignment and packing patch is applied. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:28 +02:00
Matt Turner	169e1e26ee	i965: Fix test_eu_validate.cpp Broken by commit `a7217e909c` ("i965: Pass pointer and end of assembly to brw_validate_instructions"). Reported-by: Aaron Watry <awatry@gmail.com>	2017-05-16 11:45:07 -07:00
Matt Turner	aae2626be8	i965: Add a weak no-op nir_print_instr() symbol intel_asm_annotation.c is part of libintel_compiler.la, which contains code for disassembling and validating shaders that we want to call in aubinator_error_decode. dump_assembly() calls nir_print_instr() to print annotations, and although dump_assembly() is not called by aubinator_error_decode (nor is any function in intel_asm_annotation.c) it causes undefined references to nir_print_instr(). To work around, provide a no-op weak symbol to resolve against.	2017-05-15 11:43:01 -07:00
Matt Turner	d98e82c772	i965: Allow brw_eu_validate to handle compact instructions This will allow the validator to run on shader programs we find in the GPU hang error state. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-15 11:42:56 -07:00
Matt Turner	a7217e909c	i965: Pass pointer and end of assembly to brw_validate_instructions This will allow us to more easily run brw_validate_instructions() on shader programs we find in GPU hang error states. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-15 11:42:47 -07:00
Jason Ekstrand	037ce253b1	i965/vec4: Delete the system value infastructure The only thing still using it is INVOCATION_ID for geometry shaders. That's easily enough inlined into the nir_intrinsic_load_invocation_id handling code. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:07 -07:00
Jason Ekstrand	2e9916ea04	i965/vec4: Use NIR to do GS input remapping We're already doing this in the FS back-end. This just does the same thing in the vec4 back-end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:07 -07:00
Jason Ekstrand	e31042ab40	i965/fs: Move remapping of gl_PointSize to the NIR level Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	5b00c3cc05	i965/nir: Inline remap_inputs_with_vue_map Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	0d5f89cdc3	i965/vec4: Use NIR remapping for VS attributes The NIR pass already handles remapping system values to attributes for us so we delete the system value code as part of the conversion. We also change nir_lower_vs_inputs to take an explicit inputs_read bitmask and pass in the inputs_read from prog_data instead from pulling it out of NIR. This is because the version in prog_data may get EDGEFLAG added to it on some old platforms. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	80aa6e9d32	intel/compiler/vs: Move inputs_read handling to generic code Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:03 -07:00
Jason Ekstrand	d2fe804d18	i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE map We also add a nice little comment to make it more clear exactly what happens with the edge flag copy. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	ca4d192802	i965/fs: Lower gl_VertexID and friends to inputs at the NIR level NIR calls these system values but they come in from the VF unit as vertex data. It's terribly convenient to just be able to treat them as such in the back-end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	24e6fba500	i965/vs: Set uses_vertexid and friends from brw_compile_vs Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	5e832302dc	i965: Move multiply by 4 for VS ATTR setup into the scalar backend. The vec4 backend will want to count in units of vec4s, not scalar components. The simplest solution is to move the multiplication by 4 into the scalar backend. This also improves consistency with how we count varyings. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	36764b6923	i965/nir: Inline remap_vs_attrs Now that we have nice block iterators, there's no good reason for this to be off on it's own. While we're here, we convert to using the NIR const index getters/setters instead of whacking const_index values directly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	b86dba8a0e	nir: Embed the shader_info in the nir_shader again Commit `e1af20f18a` changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since `00620782c9`, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Lionel Landwerlin	32f14332f5	intel: compiler: prevent integer overflow CID: 1399477, 1399478 (Integer handling issues) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-05-09 13:56:17 +01:00
Lionel Landwerlin	85182e490c	intel: compiler: remove duplicated code CID: 1399470: (Control flow issues) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-05-09 13:56:17 +01:00
Rafael Antognolli	8fa8abef4b	i965: Move enums to brw_compiler.h. These enums live inside struct brw_wm_prog_data, so it makes sense to keep them in the same header. It also allows to use them without including brw_eu_defines.h. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:55:58 -07:00
Samuel Iglesias Gonsálvez	f57e234fdd	i965/vec4: don't modify regioning parameters to the sources of DF align1 instructions The regioning parameters are now properly set by convert_to_hw_regs() and we don't need to fix them in the generator. That latter fix previously done in the generator was strictly speaking wrong for any non-identity regions. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez	aaeb1c99be	i965/vec4: fix register width for DF VGRF and UNIFORM On gen7, the swizzles used in DF align16 instructions works for element size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that in the rest of the code and prepare the instructions for this (scalarize_df()), we need to set it to two again. However, for DF align1 instructions, a width of 2 is wrong as we are not reading the data we want. For example, an uniform would have a region of <0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access to the first 4. This patch sets the default one to 4 and then modifies the width of align16 instruction's DF sources when we translate the logical swizzle to the physical one. v2: - Remove conditional (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez	7f728bce81	i965/vec4: fix vertical stride to avoid breaking region parameter rule From IVB PRM, vol4, part3, "General Restrictions on Regioning Parameters": "If ExecSize = Width and HorzStride ≠ 0, VertStride must be set to Width * HorzStride." In next patch, we are going to modify the region parameter for uniforms and vgrf. For uniforms that are the source of DF align1 instructions, they will have <0, 4, 1> regioning and the execsize for those instructions will be 4, so they will break the regioning rule. This will be the same for VGRF sources where we use the vstride == 0 exploit. As we know we are not going to cross the GRF boundary with that execsize and parameters (not even with the exploit), we just fix the vstride here. v2: - Move is_align1_df() (Curro) - Refactor exec_size == width calculation (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Francisco Jerez	58324389be	intel/fs: Take into account amount of data read in spilling cost heuristic. Until now the spilling cost calculation was neglecting the amount of data read from the register during the spilling cost calculation. This caused it to make suboptimal decisions in some cases leading to higher memory bandwidth usage than necessary. Improves Unigine Heaven performance by ~4% on BDW, reversing an unintended FPS regression from my previous commit `147e71242c` with n=12 and statistical significance 5%. In addition SynMark2 OglCSDof performance is improved by an additional ~5% on SKL, and a Kerbal Space Program apitrace around the Moho planet I can provide on request improves by ~20%. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-24 11:01:40 -07:00
Francisco Jerez	ecc19e12dc	intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy. This is what we use later on to compute the number of registers that will actually get spilled to memory, so it's more likely to match reality than the current open-coded approximation. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-24 10:59:56 -07:00
Kenneth Graunke	6b10c37b9c	i965/vec4: Use reads_accumulator_implicitly(), not MACH checks. Curro pointed out that I should not just check for MACH, but use the reads_accumulator_implicitly() helper, which would also prevent the same bug with MAC and SADA2 (if we ever decide to use them). Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-24 10:53:49 -07:00
Timothy Arceri	7a7ee40c2d	nir/i965: add before ffma algebraic opts This shuffles constants down in the reverse of what the previous patch does and applies some simpilifications that may be made possible from doing so. Shader-db results BDW: total instructions in shared programs: 12980814 -> 12977822 (-0.02%) instructions in affected programs: 281889 -> 278897 (-1.06%) helped: 1231 HURT: 128 total cycles in shared programs: 246562852 -> 246567288 (0.00%) cycles in affected programs: 11271524 -> 11275960 (0.04%) helped: 1630 HURT: 1378 V2: mark float opts as inexact Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 12:08:14 +10:00
Kenneth Graunke	2faf227ec2	i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce(). opt_register_coalesce() was optimizing sequences such as: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D mov(8) m4.zw:F, vgrf5.xxxy:F into: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D This doesn't work - if we're going to reswizzle MACH, we'd need to reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw components with attr18.yy * attr19.yy. But the MACH instruction expects .z to contain attr18.x * attr19.x. Bogus results ensue. No change in shader-db on Haswell. Prevents regressions in Timothy's patches to use enhanced layouts for varying packing (which rearrange code just enough to trigger this pre-existing bug, but were fine themselves). Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-22 00:01:16 -07:00
Matt Turner	2eeb1b0ad9	i965: Use correct VertStride on align16 instructions. In commit `c35fa7a`, we changed the "width" of DF source registers to 2, which is conceptually fine. Unfortunately a VertStride of 2 is not allowed by align16 instructions on IVB/BYT, and the regular VertStride of 4 works fine in any case. See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test for example: cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed v2: - Add spec quote (Curro). - Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro) Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	d8441e2276	i965/vec4/dce: improve track of partial flag register writes This is required for correctness in presence of multiple 4-wide flag writes (e.g. 4-wide instructions with a conditional mod set) which update a different portion of the same 8-bit flag subregister. Right now we keep track of flag dataflow with 8-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 8 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	c1fc8fad47	i965/vec4: don't do horizontal stride on some register file types horiz_offset() shouldn't be doing anything for scalar registers, because all channels of any SIMD instructions will end up reading or writing the same component of the register, so shifting the register offset would be wrong. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Re-implement in terms of is_uniform() for simplicity. Pass argument by const reference. Clarify commit message. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Matt Turner	21e8e3a848	i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT. Otherwise for a pack_double_2x32_split opcode, we emit: vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted }; mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted }; mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same The intention was to emit mov(4)s for the instructions that have ERROR annotations. See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test for example. v2 (Samuel): - Instead of setting the exec size to a fixed value, don't double it (Curro). - Add PICK_{HIGH,LOW}_32BIT to the condition. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Trivial rebase changes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	f030aaf2fb	i965/vec4: use vec4_builder to emit instructions in setup_imm_df() Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to static stand-alone function. Don't write unused components in the result. Use vec4_builder interface for register allocation. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Juan A. Suarez Romero	a907c91e93	i965/vec4: consider subregister offset in live variables Take into account offset values less than a full register (32 bytes) when getting the var from register. This is required when dealing with an operation that writes half of the register (like one d2x in IVB/BYT, which uses exec_size == 4). v2: - Take in account this offset < 32 in liveness analysis too (Curro) v3: - Change formula in var_from_reg() (Curro) - Remove useless changes (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	92649a3e67	i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB On IVB, DF instructions have lowered the SIMD width to 4 but the exec_size will be later doubled. Fix the assert to avoid crashing in this case. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4 == 0' part the assertion was redundant with the previous assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	6e3265eae5	i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type This way we can set the destination type as double to all these new opcodes, avoiding any optimizer's confusion that was happening before. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop no_spill workaround originally needed due to the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	50a5217637	i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones When doing a 64-bit to a smaller data type size conversion, the destination should be aligned to 64-bits. Because of that, we need to gather the data after the actual conversion. Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but now we split them explicitely in two different instructions: VEC4_OPCODE_FROM_DOUBLE just do the conversion and VEC4_OPCODE_PICK_LOW_32BIT will gather the data. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	cfaf14a126	i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT In the generator we must generate slightly different code for Ivybridge/Baytrail, because of the way the stride works in this hardware. v2: - Use stride and don't need to fix dst (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	be445d3ea3	i965/vec4: keep original type when dealing with null registers Keep the original type when dealing with null registers. Especially because we do no want to introduce an implicit conversion between types that could affect the conditional flags. This affects especially when the original type is DF, and we are working on Ivybridge/Baytrail. v2 (Curro) - Fix typo. - Use retype() instead of applying the type directly. - Remove unneeded retype. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a21dc2b500	i965/vec4: split DF instructions and later double its execsize in IVB/BYT We need to split DF instructions in two on IVB/BYT as it needs an execsize 8 to process 4 DF values (one GRF in total). v2: - Rename helper and make it static inline function (Matt). - Fix indention and add braces (Matt). v3: - Don't edit IR instruction when doubling exec_size (Curro) - Add comment into the code (Curro). - Manage ARF registers like the others (Curro) v4: - Add get_exec_type() function and use it to calculate the execution size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Clarify comment in get_lowered_simd_width(). Move SIMD width workaround outside of 'if (...inst->size_written > REG_SIZE)' conditional block, since the problem should be independent of whether the amount of data written by the instruction is greater or lower than a GRF. Drop redundant is_ivb_df definition. Drop bogus inst->exec_size < 8 check. Simplify channel group assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a5399e8b1c	i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT The hardware applies the same channel enable signals to both halves of the compressed instruction which will be just wrong under non-uniform control flow. Fix this by splitting those instructions to SIMD4. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	ebfb703d44	i965/fs: Get 64-bit indirect moves working on IVB. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Matt Turner	630b84cdc8	i965: Use source region <1,2,0> when converting to DF. Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead of two. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	3198ce3f96	i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT According to the IVB and HSW PRMs: "2.When the destination requires two registers and the sources are indirect, the sources must use 1x1 regioning mode." So for DF instructions the execution size is not limited by the number of address registers that are available, but by the EU decompression logic not handling VxH indirect addressing correctly. This patch limits the SIMD width to 4 in this case. v2: - Fix typo (Matt). - Fix condition (Curro) v3: - Add spec quote (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	571cbd05eb	i965/fs: fix dst stride in IVB/BYT type conversions When converting a DF to 32-bit conversions, we set dst stride to 2, to fulfill alignment restrictions because the upper Dword of every Qword will be written with undefined value. But in IVB/BYT, this is not necessary, as each DF conversion already writes 2, the first one the real value, and the second one a 0. That is, IVB/BYT already set stride = 2 implicitly, so we must set it to 1 explicitly to avoid ending up with stride = 4. v2: - Fix typo (Matt) v3: - Fix stride in the destination's brw_reg, don't modity IR (Curro) v4: - Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro) - Fix comment (Curro). - Relax hstride assert (Curro) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Minor spelling fixes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	af6fc3a8ea	i965/fs: rename lower_d2x to lower_conversions v2: - Change the name to lower_conversions. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	dee31311eb	Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs." This reverts commit `7dccd38b40`. d2x pass fixes SEL instructions when there is a type conversion by doing a SEL without type conversion and then convert the result. This pass also takes into account the non-uniform control flow. Then, `7dccd38b40` is not needed anymore. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	aeecc82d05	i965/fs: generalize the legalization d2x pass Generalize it to lower any unsupported narrower conversion. v2 (Curro): - Add supports_type_conversion() - Reuse existing intruction instead of cloning it. - Generalize d2x to narrower and equal size conversions. v3 (Curro): - Make supports_type_conversion() const and improve it. - Use foreach_block_and_inst to process added instructions. - Simplify code. - Add assert and improve comments. - Remove redundant mov. - Remove useless comment. - Remove saturate == false assert and add support for saturation when fixing the conversion. - Add get_exec_type() function. v4 (Curro): - Use get_exec_type() function to get sources' type. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Matt Turner	94ffeb7fa2	i965: Use <0,2,1> region for scalar DF sources on IVB/BYT. On HSW+, scalar DF sources can be accessed using the normal <0,1,0> region, but on IVB and BYT DF regions must be programmed in terms of floats. A <0,2,1> region accomplishes this. v2: - Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro). v3: - Added comment explaining the reason (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	82d17615f4	i965/fs: clamp exec_size when an instruction has a scalar DF source Then the SIMD lowering pass will get rid of any compressed instructions with scalar source (whether force_writemask_all or not) and we avoid hitting the Gen7 region decompression bug. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Suggested-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	0f1316d4db	i965/fs: double regioning parameters and execsize for DF in IVB/BYT In IVB and BYT, both regioning parameters and execution sizes are measured as 32-bits element size. So when we have something like: mov(8) g2<1>DF g3<4,4,1>DF We are not actually moving 8 doubles (our intention), but 4 doubles. We need to double the parameters to cope with this issue. However, horizontal strides don't behave as they're supposed to on IVB for DF regions, they will cause each 32-bit half of DF sources to be strided individually, and doubling the value won't make any difference. v2: - Use devinfo directly (Matt). - Use Baytrail instead of Valleview (Matt). - Use IvyBridge instead of Ivy (Matt) - Double the exec_size in code emission (Curro) v3: - Change hstride doubling by an assert and fix commit log (Curro). - Substitute remaining compiler->devinfo by devinfo (Curro). v4: - Fix comment (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	79af256388	i965/fs: add helper to retrieve instruction execution type The execution data size is the biggest type size of any instruction operand. We will use it to know if the instruction deals with DF, because in Ivy we need to double the execution size and regioning parameters. v2: - Fix typo in commit log (Matt) - Use static inline function instead of fs_inst's method (Curro). - Define the result as a constant (Curro). - Fix indentation (Matt). - Add braces to nested control flow (Matt). v3 (Curro): - Add get_exec_type() and other auxiliary functions and use them to calculate its size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced execution type for integer vector types. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Move into brw_ir_fs.h header for consistency with the VEC4 back-end. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Matt Turner	fd349d29e4	i965: Handle IVB DF differences in the validator. On IVB/BYT, region parameters and execution size for DF are in terms of 32-bit elements, so they are doubled. For evaluating the validity of an instruction, we halve them. v2 (Sam): - Add comments. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:07 -07:00
Iago Toral Quiroga	fbac8b1f94	i965/disasm: also print nibctrl in IVB for execsize=8 4-wide DF operations where NibCtrl applies require and execsize of 8 in IvyBridge/BayTrail. v2: - Refactor NibCtrl printing (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:06 -07:00
Francisco Jerez	147e71242c	i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic. The individual branches of an if/else/endif construct will be executed some unknown number of times between 0 and 1 relative to the parent block. Use some factor in between as weight while approximating the cost of spill/fill instructions within a conditional if-else branch. This favors spilling registers used within conditional branches which are likely to be executed less frequently than registers used at the top level. Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on my SKL GT4e. Should have a comparable effect on other platforms. No significant regressions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-11 15:28:54 -07:00
Jason Ekstrand	3503b2714b	i965/fs: Always provide a default LOD of 0 for TXS and TXL We already provide a default LOD for textureQueryLevels and texture() on non-fragment stages. However, there are more cases where one is needed such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list out all of the cases one at a time, just provide the default for all TXS and TXL operations. This fixes a shader validation error in the new Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391 Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-04 18:33:35 -07:00
Jason Ekstrand	405ef7bb33	intel/vec4: Add some fall through comments Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-03 16:58:35 -07:00
Alejandro Piñeiro	2f8d6bd578	i965: expose BRW_OPCODE_[F32TO16/F16TO32] name on gen8+ Technically those hw operations are only available on gen7, as gen8+ support the conversion on the MOV. But, when using the builder to implement nir operations (example: nir_op_fquantize2f16), it is not needed to do the gen check. This check is done later, on the final emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the specific operation accordingly. So in the middle, during optimization phases those hw operations can be around for gen8+ too. Without this patch, several (at least 95) vulkan-cts quantize tests crashes when using INTEL_DEBUG=optimizer. For example: dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert v2: simplify the code using GEN_GE (Ilia Mirkin) v3: tweak brw_instruction_name instead of changing opcode_descs table, that is used for validation (Matt Turner) Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-03-29 17:34:15 +02:00
Matt Turner	7dccd38b40	i965/fs: Don't emit SEL instructions for type-converting MOVs. SEL can only convert between a few integer types, which we basically never do. Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-03-27 10:59:42 -07:00
Iago Toral Quiroga	ddb2bb3ed4	anv/pipeline: make FragCoord include sample positions when sample shading We need to know if sample shading has been requested during shader compilation since that affects the way fragment coordinates are computed. Notice that the semantics of fragment coordinates only depend on whether sample shading has been requested, not on whether more than one sample will actually be produced (that is, minSampleShading and rasterizationSamples do not affect this behavior). Because this setting affects the code we generate for the shader, we also need to include it in the WM prog key. Notice we don't need to alter the OpenGL code because it doesn't ever use this behavior, so they key's value is always false (the default). Fixes: dEQP-VK.glsl.builtin_var.fragcoord_msaa.* Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-24 08:11:53 +01:00
Matt Turner	7499bc7fd7	i965: Replace OPT_V() with OPT(). We want to be able to check the progress of each pass and dump the NIR for debugging purposes if it changed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Matt Turner	1be91bd9d8	i965/fs: Return progress from demote_sample_qualifiers(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Matt Turner	fd3351246c	i965/fs: Return progress from move_interpolation_to_top(). And mark as static at the same time. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Emil Velikov	2438c0a236	intel/compiler: consistently use ifndef guards over pragma once Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Vedran Miletić <vedran@miletic.net> Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-03-22 16:55:22 +00:00
Emil Velikov	3b277bae66	i965: make brw_setup_image_uniform_values static Used only internally. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Vedran Miletić <vedran@miletic.net> Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-03-22 16:55:21 +00:00
Jason Ekstrand	762a6333f2	nir: Rework conversion opcodes The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	7107b32155	i965/fs: Re-arrange conversion operations Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	bab4610e9c	i965/vec4: Get rid of the type parameter from to/from_double Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	b377be9213	i965/fs: Use num_components from the SSA def in image intrinsics Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2017-03-14 07:36:40 -07:00
Iago Toral Quiroga	e8eeb759b7	intel: fix compiler build compiler/brw_vec4_gs_visitor.cpp:744:39: error: ‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES); Fixes: `d0d4a5f43b` ("i965: split EU defines to brw_eu_defines.h") Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-13 13:09:24 +01:00
Emil Velikov	aa09c9552c	intel/compiler: whitespace cleanups Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:35 +00:00
Emil Velikov	bdc5036464	intel/compiler: link all tests again gtest, even test_eu_compact" At the moment all the tests but test_eu_compact are actual C++ gtests. To simplify things, we can move the gtest.la to the common TEST_LIBS. As we're here, we can rename change the test extension [to .cpp] to avoid using the confusing dummy.cpp. Add a nice comment in the makefile for posterity. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:35 +00:00
Jason Ekstrand	700bebb958	i965: Move the back-end compiler to src/intel/compiler Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:34 +00:00

... 10 11 12 13 14

656 Commits