KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	b178762d05	intel/isl: Make get_intratile_offset_el take the element size in bits Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:56 -07:00
Jason Ekstrand	757f7087a5	intel/isl: Add a new layout for HiZ and stencil on Sandy Bridge Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:47 -07:00
Jason Ekstrand	cb8cdab8e8	intel/isl: Generate phys_total_el from isl_calc_phys_extent The only surface layout for which slice0 makes any sense is GEN4_2D. Move all of the slice0 stuff into isl_calc_phys_total_extent_el_gen4_2d and make the others trivially return the total size in surface elements. As a side-effect, array_pitch_el_rows is now returned from these helpers as well. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:45 -07:00
Jason Ekstrand	918f41bb29	intel/isl: Don't check array pitch for gen4 3D textures Array pitch doesn't matter in this layout. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:43 -07:00
Jason Ekstrand	044bfb292f	intel/isl: Refactor to use a phys_total_el extent. We've already implicitly been using a physical total size in surface elements. This just centralizes things a bit. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:41 -07:00
Jason Ekstrand	1547d133ac	intel/isl: Add an isl_assert_div helper This is a fairly common operation and it's nice to be able to just call the one little function. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:39 -07:00
Jason Ekstrand	58051ad220	intel/isl: Refactor isl_calc_array_pitch_el_rows Over 90% of the function only applies to ISL_DIM_LAYOUT_GEN4_2D anyway so we can just handle the other two as special cases at the top. The two "generic" cases below the switch only apply on gen9 and above and only to 3D or CCS surfaces. This implies that they only apply to surfaces with ISL_DIM_LAYOUT_GEN4_2D. Making them look generic is a lie. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:37 -07:00
Jason Ekstrand	fe13c59c1b	intel/isl: Move isl_calc_array_pitch_el_rows higher up Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:34 -07:00
Jason Ekstrand	c1a70165be	intel/isl: Remove the device parameter from isl_tiling_get_info We were only using it for validating that we don't use Ys/Yf on gen8 and earlier. Removing it from isl_tiling_get_info lets us remove it from a bunch of other things that had no business needing a hardware generation. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-01 15:33:31 -07:00
Kenneth Graunke	fe14a9a501	i965: Drop duplicate shadow variable. We already initialized this at the top of the function. Trivial.	2017-06-01 14:28:12 -07:00
Kenneth Graunke	fe9699dcb4	genxml: Make 3DSTATE_CONSTANT_BODY on Gen7+ use arrays. This will let us initialize the constant buffers with loops. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-06-01 11:49:46 -07:00
Kenneth Graunke	12303bd390	genxml: Fix decoder to print the array element on field members. Previously we'd print things like: 0xfffbb568: 0x00010000 : Dword 1 ReadLength: 0 ReadLength: 1 0xfffbb568: 0x00000001 : Dword 1 ReadLength: 1 ReadLength: 0 instead of the more obvious: 0xfffbb568: 0x00010000 : Dword 1 ReadLength[0]: 0 ReadLength[1]: 1 0xfffbb568: 0x00000001 : Dword 1 ReadLength[2]: 1 ReadLength[3]: 0 (Yes, the ralloc context here is bogus - the decoder leaks just about everything. We need to use proper ralloc contexts someday...) Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-06-01 11:49:46 -07:00
Kenneth Graunke	73c21e69d0	genxml: Fix decoding of array groups. If you had a group as the first element of a struct, i.e. <struct name="3DSTATE_CONSTANT_BODY" length="10"> <group count="4" start="0" size="16"> <field name="ReadLength" start="0" end="15" type="uint"/> </group> ... </struct> we would get a group_offset of 0, causing create_field() to think the field wasn't in a group, and fail to offset forward for successive array elements. So we'd mark all the array elements as offset 0. Using ctx->group->elem_size is a better check for "are we in a group?". Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-06-01 11:49:45 -07:00
Kenneth Graunke	d1b949282f	genxml: Fix decoder for groups with multiple fields. If you have something like: <group count="0" start="96" size="32"> <field name="Entry_0" start="0" end="15" type="GATHER_CONSTANT_ENTRY"/> <field name="Entry_1" start="16" end="31" type="GATHER_CONSTANT_ENTRY"/> </group> We would reset ctx->group_count to 0 after processing the first field, so the second would not have a group count. This is largely untested, as the only groups with multiple fields are packets we don't emit in Mesa. Found by inspection. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-06-01 11:49:45 -07:00
Kenneth Graunke	df2d55ba57	genxml: Fix parsing of address fields in groups. For example, <group count="4" start="64" size="64"> <field name="Pointer" start="5" end="63" type="address"/> </group> used to generate: const uint64_t v2_address = __gen_combine_address(data, &dw[2], values->Pointer, 0); ... const uint64_t v4_address = __gen_combine_address(data, &dw[4], values->Pointer, 0); ... but now generates code with proper subscripts: const uint64_t v2_address = __gen_combine_address(data, &dw[2], values->Pointer[0], 0); ... const uint64_t v4_address = __gen_combine_address(data, &dw[4], values->Pointer[1], 0); ... Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-06-01 11:49:45 -07:00
Kenneth Graunke	65f5f3c85c	i965: Move SOL PSIZ hacks from draw time to link time. We can just update the gl_transform_feedback_info fields at link time to make the VUE header fields have the right location and component. Then we don't need to handle them specially at draw time, which is expensive. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-06-01 00:08:29 -07:00
Kenneth Graunke	56535959fd	anv: Port over CACHE_MODE_1 optimization fix enables from brw. Ben and I haven't observed these to help anything, but they enable hardware optimizations for particular cases. It's probably best to enable them ahead of time, before we run into such a case. Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-30 14:59:31 -07:00
Kenneth Graunke	53368b008e	genxml: Add Gen9 CACHE_MODE_1 definitons. These were already in gen8.xml but not gen9.xml. There are a few new fields and a couple that have changed. These are all documented in the Skylake PRM, Volume 2c Command Reference: Registers, Part 1. Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-30 14:59:31 -07:00
Kenneth Graunke	9afe5846d2	genxml: Make a SCISSOR_RECT structure on Gen4-5. Gen6+ support multiple scissor rectangles, and define a SCISSOR_RECT structure containing their dimensions. On Gen4-5, those same fields exist in SF_VIEWPORT. This patch extracts the SF_VIEWPORT fields into a SCISSOR_RECT structure. Although not a named concept on Gen4-5, it works just as well, and gives us a consistent SCISSOR_RECT structure across all generations, making it easier to reuse code. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-29 21:46:37 -07:00
Kenneth Graunke	1e3880544e	i965: Ignore INTEL_SCALAR_* debug variables on Gen10+. Scalar mode has been default since Broadwell, and vector mode is getting increasingly unmaintained. There are a few things that don't even fully work in vector mode on Skylake, but we've never cared because nobody uses it. There's no point in porting it forward to new platforms. So, just ignore the debug options to force it on. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-29 21:40:44 -07:00
Emil Velikov	3e8790bff0	anv: automake: list shared libraries after the static ones The compiler can discard the shared ones from the link chain, since there is no user (the static libraries) before it on the command line. Cc: mesa-stable@lists.freedesktop.org Reported-by: Laurent Carlier <lordheavym@gmail.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2017-05-29 16:42:41 +01:00
Jason Ekstrand	79f2a5541f	i965: Use BLORP for color clears on gen4-5 We don't support replicated data clears yet. Those take a bit more work and enabling replicated data clears in its own commit is probably better for bisectibility anyway. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	fa13ef285d	intel/blorp: Assert that no one tries to blit combined depth stencil Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	752d7af77a	i965: Add blorp support for gen4-5 Due to complications with things such as URB setup on gen4-5, it's easier to keep gen4 support in blorp completely internal to i965. This makes things a bit awkward because that means there's a file in i965 that includes blorp_priv.h but it's either that or have a file in blorp that includes brw_context.h. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	23125b7102	intel/blorp: Set additional brw_wm_prog_key fields on gen4-5 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	0ed6f196fc	intel/blorp: Add support for gen4-5 SF programs As part of enabling support for SF programs, we plumb the SF URB size through to emit_urb_config. For now, it's always zero but, on gen4, it may be something larger. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	8bce7bda45	intel/blorp: Make convert_to_single_slice available outside blorp_blit Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	110061afa2	intel/blorp: Use designated initializers to set up VERTEX_ELEMENTS We also add a slot variable and use it as an iterator. This will make it much easier to conditionally put something between the header and the vertex position. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	ac79806766	intel/blorp: Rename emit_viewport_state to emit_cc_viewport The real point of this packet is that it sets up CC_VIEWPORT so that name is a bit better. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	1f2f90be1f	intel/blorp: Make the common genX_blorp_exec code gen4-safe Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	a7f5d6df8a	intel/blorp: Re-arrange blorp_genX_exec.h Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	302c0488cf	intel/blorp: Don't use ffma directly It isn't supported prior to gen6 and, on gen6+, NIR will fuse the fmul and fadd into an ffma automatically for us anyway. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	675ec434f3	intel/blorp: Delete isl_to_gen_ds_surfype It's no longer used. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	e80f0840bf	intel/blorp: Pull the pipeline bits of blorp_exec into a helper Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	3d35e5a51e	intel/blorp/blit: Add support for normalized coordinates Gen5 and earlier can't do non-normalized coordinates so we need to compensate in the shader. Fortunately, it's pretty easy plumb through. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	18e18a1863	i965: Move clip program compilation to the compiler Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	9fb8a8775b	i965: Move SF compilation to the compiler Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	21ba2b4bef	intel/compiler: Make brw_disasm take const assembly Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	c336c224a6	intel/decoder: Handle the BLT ring in gen_group_get_length Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	9d1001c8e5	intel/decoder: Handle gen4 VF_STATISTICS and PIPELINE_SELECT These need special handling because they have no "DWord Length" parameter and they have an unusual bias of 1. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	87588e546e	intel/genxml: Rename 3DSTATE_AA_LINE_PARAMS on gen5 All of the other gens use "PARAMETERS". Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	04f6d975e1	intel/genxml: Use the right subtype for VF_STATISTICS on gen4 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	1fcc5e2399	intel/genxml: Iron Lake doesn't support non-normalized sampler coordinates Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	648b618dc5	intel/genxml: Add SAMPLER_STATE to gen 4.5 Somehow this got missed. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	3f8ee8c703	intel/genxml: Rename the CC_VIEWPORT pointer on gen4-5 It isn't a pointer to "color calc state", that's the packet it's in. It's a pointer to the CC viewport state. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	0ee1ef0cbb	intel/genxml: Sampler state is a pointer on gen4-5 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	64243d3b8e	intel/genxml: Suffix KSP0 fields on Iron Lake Iron Lake introduced the multiple KSP thing and so you have KSP0-3. However, the genxml didn't have an index on the first "Kernel Start Pointer" or "GRF Register Count". Add one to match gen6+. While we're here, we drop the brackets from the other "GRF Register Count" fields. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	7769e448aa	intel/genxml: Make a bunch of things offsets on gen4-5 Most things on gen4-5 are addresses because we don't have dynamic state base address and we don't have instruction state base on gen4. However, whoever converted things to addresses got a little over-excited and converted too much. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	8257fe7b18	intel/isl: Add gen4_filter_tiling Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	332a5d7a3f	intel/isl: Add support for setting component write disables Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	8958355549	intel/isl: Add support for gen4 cube maps to get_image_offset_sa Gen4 cube maps are a 2-D surface with ISL_DIM_LAYOUT_GEN4_3D which is a bit weird but accurate none the less. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	b9b7792d9a	intel/isl: Don't request space for stencil/hiz packets unless needed On Iron Lake, the packets exist but we never emit them so there's no need for us to ask the driver to make batch space for them. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Jason Ekstrand	554a1731a5	intel/blorp: Move the gen7 stencil format workaround to blorp_blit It's not needed for blorp_copy because it already overrides formats. It's also not needed for blorp_clear because it clears stencil as stencil. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-26 07:58:01 -07:00
Lionel Landwerlin	359fa0e9a0	aubinator: report error on unknown device id Since we're going to stop aubinator without a valid device id, better report an error. This also silences a Coverity warning. CID: 1405004 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-24 10:50:18 +01:00
Lionel Landwerlin	8f1f1d294d	aubinator: be consistent on exit code We're using both exit(1) & exit(EXIT_FAILURE), settle for one, same for success. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-24 10:50:18 +01:00
Lionel Landwerlin	6200d835a0	aubinator: fix double free 1;4601;0c Free previously allocated filename outside the for loop. CID: 1405014 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-24 10:50:18 +01:00
Jason Ekstrand	39adea9330	anv: Require vertex buffers to come from a 32-bit heap Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 17:37:43 -07:00
Jason Ekstrand	50d0eb5096	anv: Advertise both 32-bit and 48-bit heaps when we have enough memory Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 17:37:42 -07:00
Jason Ekstrand	34581fdd4f	anv: Refactor memory type setup This makes us walk over the heaps one at a time and add the types for LLC and !LLC to each heap. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:42 -07:00
Jason Ekstrand	b83b1af6f6	anv: Make supports_48bit_addresses a heap property Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:40 -07:00
Jason Ekstrand	00df1cd9d6	anv: Stop setting BO flags in bo_init_new The idea behind doing this was to make it easier to set various flags. However, we have enough custom flag settings floating around the driver that this is more of a nuisance than a help. This commit has the following functional changes: 1) The workaround_bo created in anv_CreateDevice loses both flags. This shouldn't matter because it's very small and entirely internal to the driver. 2) The bo created in anv_CreateDmaBufImageINTEL loses the EXEC_OBJECT_ASYNC flag. In retrospect, it never should have gotten EXEC_OBJECT_ASYNC in the first place. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:38 -07:00
Jason Ekstrand	10fad58b31	anv: Set image memory types based on the type count Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:36 -07:00
Jason Ekstrand	f7736ccf53	anv: Add valid_bufer_usage to the memory type metadata Instead of returning valid types as just a number, we now walk the list and check the buffer's usage against the usage flags we store in the new anv_memory_type structure. Currently, valid_buffer_usage == ~0. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:34 -07:00
Jason Ekstrand	92325a7efc	anv: Determine the type of mapping based on type metadata Before, we were just comparing the type index to 0. Now we actually look the type up in the table and check its properties to determine what kind of mapping we want to do. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:32 -07:00
Jason Ekstrand	c1f4343807	anv: Set up memory types and heaps during physical device init Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:30 -07:00
Jason Ekstrand	eceaf7e234	anv: Predicate 48bit support on gen >= 8 This doesn't matter right now since it only affects whether or not we set the kernel bit but, if we ever do anything else based on it, we'll want it to be correct per-gen. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:27 -07:00
Jason Ekstrand	4eecd534f0	anv/image: Get rid of the memset(aux, 0, sizeof(aux)) hack Up until now, we've been memsetting the auxiliary surface to 0 at BindImageMemory time to ensure that it is properly initialized. However, this isn't correct because apps are allowed to freely alias memory between different images and buffers so long as they properly track whether or not a particular image is valid and, if it isn't, transition from UNINITIALIZED to something else before using it. We now implement those transitions so we can drop the hack. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:22 -07:00
Jason Ekstrand	cc45c4bb80	anv: Handle transitioning depth from UNDEFINED to other layouts Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:20 -07:00
Jason Ekstrand	75edecf502	anv: Handle color layout transitions from the UNINITIALIZED layout This causes dEQP-VK.api.copy_and_blit.resolve_image.partial.* to start failing due to test bugs. See CL 1031 for a test fix. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-05-23 16:46:03 -07:00
Nanley Chery	52a6fd9871	intel/isl: Add ASTC HDR to format lists and helpers Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>	2017-05-22 11:13:53 -07:00
Tapani Pälli	f0051fcf2b	android: add -Wl,--build-id=sha1 to LDFLAGS for libvulkan_intel Just like is done on desktop and what is expected by the build-id code. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-20 08:59:57 +03:00
Emil Velikov	acf3d2afab	configure: check once for DRI3 dependencies Currently we are having the XCB_DRI3 dependencies duplicated, partially. Just do a once-off check and add all of the respective CFLAGS/LIBS where needed. As a nice side effect this helps us solve a couple of FIXMEs. DRI3 is not a thing w/o X11 so disable it in such cases. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2017-05-19 19:44:15 +01:00
Nanley Chery	56458cb168	anv/formats: Update the three-channel BC1 mappings The procedure for decompressing an opaque BC1 Vulkan format is dependant on the comparison of two colors stored in the first 32 bits of the compressed block. Here's the specified OpenGL (and Vulkan) behavior for reference: The RGB color for a texel at location (x,y) in the block is given by: RGB0, if color0 > color1 and code(x,y) == 0 RGB1, if color0 > color1 and code(x,y) == 1 (2RGB0+RGB1)/3, if color0 > color1 and code(x,y) == 2 (RGB0+2RGB1)/3, if color0 > color1 and code(x,y) == 3 RGB0, if color0 <= color1 and code(x,y) == 0 RGB1, if color0 <= color1 and code(x,y) == 1 (RGB0+RGB1)/2, if color0 <= color1 and code(x,y) == 2 BLACK, if color0 <= color1 and code(x,y) == 3 The sampling operation performed on an opaque DXT1 Intel format essentially hard-codes the comparison result of the two colors as color0 > color1. This means that the behavior is incompatible with OpenGL and Vulkan. This is stated in the SKL PRM, Vol 5: Memory Views: Opaque Textures (DXT1_RGB) Texture format DXT1_RGB is identical to DXT1, with the exception that the One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and the resulting texel color is derived strictly from the Opaque Color Encoding. The alpha channel defaults to 1.0. Programming Note Context: Opaque Textures (DXT1_RGB) The behavior of this format is not compliant with the OGL spec. The opaque and non-opaque BC1 Vulkan formats are specified to be decoded in exactly the same way except the BLACK value must have a transparent alpha channel in the latter. Use the four-channel BC1 Intel formats with the alpha set to 1 to provide the behavior required by the spec. v2 (Kenneth Graunke): - Provide a more detailed commit message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>	2017-05-18 16:46:15 -07:00
Jason Ekstrand	c499faebd7	anv: Add an option to abort on device loss This is mostly for running in our CI system to prevent dEQP from continuing on to the next test if we get a GPU hang. As it currently stands, dEQP uses the same VkDevice for almost all tests and if one of the tests hangs, we set the anv_device::device_lost flag and report VK_ERROR_DEVICE_LOST for all queue operations from that point forward without sending anything to the GPU. dEQP will happily continue trying to run tests and reporting failures until it eventually gets crash that forces the test runner to start over. This circumvents the problem by just aborting the process if we ever get a GPU hang. Since this is not the recommended behavior most of the time, we hide it behind an environment variable. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-05-18 16:32:11 -07:00
Jason Ekstrand	53f997de77	anv: Wrap the device lost error in vk_error in QueueSubmit We weren't wrapping this before because anv_cmd_buffer_execbuf may throw a more meaningful error message. However, we do change the error code into VK_ERROR_DEVICE_LOST, so we should print a new message. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-05-18 16:32:11 -07:00
Iago Toral Quiroga	2322ddf548	anv: fix multiview for clear commands According to the VK_KHX_multiview spec: "Multiview causes all drawing and clear commands in the subpass to behave as if they were broadcast to each view, where each view is represented by one layer of the framebuffer attachments." This adds support for multiview clears, which were missing in the initial implementation. v2 (Jason): - split multiview from regular case - Use for_each_bit() macro Fixes new CTS multiview tests: dEQP-VK.multiview.clear_attachments.* Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-18 11:53:25 +02:00
Samuel Iglesias Gonsálvez	e69e5c7006	i965/vec4: load dvec3/4 uniforms first in the push constant buffer Reorder the uniforms to load first the dvec4-aligned variables in the push constant buffer and then push the vec4-aligned ones. It takes into account that the relocated uniforms should be aligned to their channel size. This fixes a bug were the dvec3/4 might be loaded one part on a GRF and the rest in next GRF, so the region parameters to read that could break the HW rules. v2: - Fix broken logic. - Add a comment to explain what should be needed to optimise the usage of the push constant buffer slots, as this patch does not pack the uniforms. v3: - Implemented the push constant buffer usage optimization. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez	8aa6ada838	i965/vec4: fix swizzle and writemask when loading an uniform with constant offset It was setting XYWZ swizzle and writemask to all uniforms, no matter if they were a vector or scalar, so this can lead to problems when loading them to the push constant buffer. Moreover, 'shift' calculation was designed to calculate the offset in DWORDS, but it doesn't take into account DFs, so the calculated swizzle for the later ones was wrong. The indirect case is not changed because MOV INDIRECT will write to all components. Added an assert to verify that these uniforms are aligned. v2: - Fix 'shift' calculation (Curro) - Set both swizzle and writemask. - Add assert(shift == 0) for the indirect case. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:54 +02:00
Samuel Iglesias Gonsálvez	354f7f2cb9	i965/vec4/gs: restore the uniform values which was overwritten by failed vec4_gs_visitor execution We are going to add a packing feature to reduce the usage of the push constant buffer. One of the consequences is that 'nr_params' would be modified by vec4_visitor's run call, so we need to restore it if one of them failed before executing the fallback ones. Same thing happens to the uniforms values that would be reordered afterwards. Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when the dvec4 alignment and packing patch is applied. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:28 +02:00
Chih-Wei Huang	bfc0c23843	Android: correct libz dependency Commit `6facb0c0` ("android: fix libz dynamic library dependencies") unconditionally adds libz as a dependency to all shared libraries. That is unnecessary. Commit `85a9b1b5` introduced libz as a dependency to libmesa_util. So only the shared libraries that use libmesa_util need libz. Fix Android Lollipop build by adding the include path of zlib to libmesa_util explicitly instead of getting the path implicitly from zlib since it doesn't export the include path in Lollipop. Fixes: `6facb0c0` "android: fix libz dynamic library dependencies" Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Rob Herring <robh@kernel.org>	2017-05-17 14:04:18 +01:00
Jason Ekstrand	e0d6f9afba	intel/isl/gen6: Fix combined depth stencil alignment All combined depth stencil buffers (even those with just stencil) require a 4x4 alignment on Sandy Bridge. The only depth/stencil buffer type that requires 4x2 is separate stencil. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-16 17:04:26 -07:00
Jason Ekstrand	74d626f383	intel/isl: Refactor gen8_choose_image_alignment_el Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-16 17:04:26 -07:00
Jason Ekstrand	2486c7dd54	intel/isl: Refactor gen6_choose_image_alignment_el Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-16 17:04:26 -07:00
Jason Ekstrand	715f47cb34	intel/isl: Refactor gen7_choose_image_alignment_el The Ivy Bridge PRM provides a nice table that handles most of the alignment cases in one place. For standard color buffers we have a little freedom of choice but for most depth, stencil and compressed it's hard-coded. Chad's original functions split halign and valign apart and implemented them almost entirely based on restrictions and not the table. This makes things way more confusing than they need to be. This commit gets rid of the split and makes us implement the exact table up-front. If our surface isn't one of the ones in the table then we have to make real choices. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-16 17:04:26 -07:00
Pohjolainen, Topi	236f17a9f7	intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4 The reasoning Chad gave in the comment for choosing a valign of 4 is entirely bunk. The fact that you have to multiply pitch by 2 is completely unrelated to the halign/valign parameters used for texture layout. (Not completely unrelated. W-tiling is just Y-tiling with a bit of extra swizzling which turns 8x8 W-tiled chunks into 16x4 y-tiled chunks so it makes everything easier if miplevels are always aligned to 8x8.) The fact that RENDER_SURFACE_STATE::SurfaceVerticalAlignmet doesn't have a VALIGN_8 option doesn't matter since this is gen7 and you can't do stencil texturing anyway. v2 (Jason Ekstrand): - Delete most of Chad's comment and add a more descriptive commit message. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-16 17:04:26 -07:00
Matt Turner	169e1e26ee	i965: Fix test_eu_validate.cpp Broken by commit `a7217e909c` ("i965: Pass pointer and end of assembly to brw_validate_instructions"). Reported-by: Aaron Watry <awatry@gmail.com>	2017-05-16 11:45:07 -07:00
Jason Ekstrand	b5437fc05c	anv: Implement VK_KHR_get_surface_capabilities2 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-05-16 08:38:46 -07:00
Matt Turner	b1af896853	intel/aubinator_error_decode: Disassemble shader programs Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-15 12:04:04 -07:00
Matt Turner	23685f07d1	intel/aubinator_error_decode: Stop decoding after MI_BATCH_BUFFER_END Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-15 11:43:20 -07:00
Matt Turner	8e7221fa5a	intel/tools: Refactor gen_disasm_disassemble() to use annotations Which will allow us to print validation errors found in shader assembly in GPU hang error states. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-15 11:43:14 -07:00
Matt Turner	aaa0329b5f	intel/decoder: Fix indentation Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-15 11:43:13 -07:00
Matt Turner	3443bd45a3	genxml: Remove brackets from kernel start pointer names Newer Gens' names don't have the brackets. Having common names will make some later patches simpler. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-05-15 11:43:11 -07:00
Matt Turner	aae2626be8	i965: Add a weak no-op nir_print_instr() symbol intel_asm_annotation.c is part of libintel_compiler.la, which contains code for disassembling and validating shaders that we want to call in aubinator_error_decode. dump_assembly() calls nir_print_instr() to print annotations, and although dump_assembly() is not called by aubinator_error_decode (nor is any function in intel_asm_annotation.c) it causes undefined references to nir_print_instr(). To work around, provide a no-op weak symbol to resolve against.	2017-05-15 11:43:01 -07:00
Matt Turner	d98e82c772	i965: Allow brw_eu_validate to handle compact instructions This will allow the validator to run on shader programs we find in the GPU hang error state. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-05-15 11:42:56 -07:00
Matt Turner	a7217e909c	i965: Pass pointer and end of assembly to brw_validate_instructions This will allow us to more easily run brw_validate_instructions() on shader programs we find in GPU hang error states. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-15 11:42:47 -07:00
Lionel Landwerlin	55be6653e0	intel: gen-decoder: fix xml parser leak In the unlikely case the parsing of genxml files fails, we were leaking an xml parser object. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2017-05-15 14:06:11 +01:00
Rafael Antognolli	d9b4a81672	genxml: Add alias for MOCS. Use an alias for this field on 3DSTATE_INDEX_BUFFER on gen6+, so we can set the same value as the defines. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-05-11 21:27:38 -07:00
Kenneth Graunke	f790d6e0b4	i965: Port Gen4-5 VS_STATE to genxml. It's actually not that much code. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-05-11 16:52:59 -07:00
Kenneth Graunke	d65e19f5c6	genxml: Fix KSPs on Ironlake to be offsets, not pointers. We use Instruction State Base Address on Ironlake, so we want KSP to be an offset not an actual pointer. Gen4/G45 use pointers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-11 16:33:48 -07:00
Emil Velikov	4c22b99953	anv: document that anv_gem_mmap returns MAP_FAILED on error Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-05-11 13:58:20 +01:00
Kenneth Graunke	620f12a53f	i965: Drop INTEL_DEBUG=stats. For whatever reason, we had an INTEL_DEBUG=stats option that enabled various statistics counters on Gen4-5 systems. It's been around forever, though I can't think of a single time that it's been useful. On Gen6+, we enable statistics all the time because they're necessary to support various query object targets. Turning them off would break those queries. Gen4-5 don't support those queries, so the statistics counters generally aren't useful; we disabled them by default. This patch disables them altogether. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-05-10 11:37:19 -07:00
Grazvydas Ignotas	0ef302638f	anv: don't leak DRM devices After successful drmGetDevices2() call, drmFreeDevices() needs to be called. Fixes: `b1fb6e8d` "anv: do not open random render node(s)" Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # radv version	2017-05-10 01:13:44 +03:00
Grazvydas Ignotas	e0aee8b667	anv: fix possible stack corruption drmGetDevices2 takes count and not size. Probably hasn't caused problems yet in practice and was missed as setups with more than 8 DRM devices are not very common. Fixes: `b1fb6e8d` "anv: do not open random render node(s)" Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-05-10 01:13:44 +03:00
Jason Ekstrand	037ce253b1	i965/vec4: Delete the system value infastructure The only thing still using it is INVOCATION_ID for geometry shaders. That's easily enough inlined into the nir_intrinsic_load_invocation_id handling code. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:07 -07:00
Jason Ekstrand	2e9916ea04	i965/vec4: Use NIR to do GS input remapping We're already doing this in the FS back-end. This just does the same thing in the vec4 back-end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:07 -07:00
Jason Ekstrand	e31042ab40	i965/fs: Move remapping of gl_PointSize to the NIR level Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	5b00c3cc05	i965/nir: Inline remap_inputs_with_vue_map Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	0d5f89cdc3	i965/vec4: Use NIR remapping for VS attributes The NIR pass already handles remapping system values to attributes for us so we delete the system value code as part of the conversion. We also change nir_lower_vs_inputs to take an explicit inputs_read bitmask and pass in the inputs_read from prog_data instead from pulling it out of NIR. This is because the version in prog_data may get EDGEFLAG added to it on some old platforms. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	80aa6e9d32	intel/compiler/vs: Move inputs_read handling to generic code Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:03 -07:00
Jason Ekstrand	d2fe804d18	i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE map We also add a nice little comment to make it more clear exactly what happens with the edge flag copy. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	ca4d192802	i965/fs: Lower gl_VertexID and friends to inputs at the NIR level NIR calls these system values but they come in from the VF unit as vertex data. It's terribly convenient to just be able to treat them as such in the back-end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	24e6fba500	i965/vs: Set uses_vertexid and friends from brw_compile_vs Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	5e832302dc	i965: Move multiply by 4 for VS ATTR setup into the scalar backend. The vec4 backend will want to count in units of vec4s, not scalar components. The simplest solution is to move the multiplication by 4 into the scalar backend. This also improves consistency with how we count varyings. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	36764b6923	i965/nir: Inline remap_vs_attrs Now that we have nice block iterators, there's no good reason for this to be off on it's own. While we're here, we convert to using the NIR const index getters/setters instead of whacking const_index values directly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Jason Ekstrand	b86dba8a0e	nir: Embed the shader_info in the nir_shader again Commit `e1af20f18a` changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since `00620782c9`, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Lionel Landwerlin	32f14332f5	intel: compiler: prevent integer overflow CID: 1399477, 1399478 (Integer handling issues) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-05-09 13:56:17 +01:00
Lionel Landwerlin	85182e490c	intel: compiler: remove duplicated code CID: 1399470: (Control flow issues) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-05-09 13:56:17 +01:00
Lionel Landwerlin	4201b7d1bf	intel: gen decoder: don't check for size_t negative values We should get either 0 or 1 here. CID: 1373562 (Control flow issues) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>	2017-05-09 13:54:08 +01:00
Lionel Landwerlin	e3a5ab2d66	anv: check return value of anv_execbuf_add_bo CID: 1405919 (Error handling issues) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-08 14:38:27 +01:00
Lionel Landwerlin	6247b8b413	anv: avoid null pointer dereference The application might not give an output structure. CID: 1405765 (Null pointer dereferences) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-08 14:38:27 +01:00
Jason Ekstrand	e05e3e07ab	anv/allocator: Only write to _vg_ptr if we have valgrind This fixes the build when not building against valgrind headers. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100945 Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-05 12:49:51 -07:00
Iago Toral Quiroga	7761cf6d01	anv/query: handle more cases of 'out of host memory' Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-05-05 08:53:33 +02:00
Jason Ekstrand	98cd512089	anv/allocator: Improve block pool growing asserts Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	24827fdf50	anv: Drop the instruction pool block size Now that we can allocate states larger than the block size, we no longer need a block size of 1MB which can be rather wasteful. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	955127db93	anv/allocator: Add support for large stream allocations Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	f82d3d38b6	anv/allocator: Allow state pools to allocate large states Previously, the maximum size of a state that could be allocated from a state pool was a block. However, this has caused us various issues particularly with shaders which are potentially very large. We've also hit issues with render passes with a large number of attachments when we go to allocate the block of surface state. This effectively removes the restriction on the maximum size of a single state. (There's still a limit of 1MB imposed by a fixed-length bucket array.) For states larger than the block size, we just grab a large block off of the block pool rather than sub-allocating. When we go to allocate some chunk of state and the current bucket does not have state, we try to pull a chunk from some larger bucket and split it up. This should improve memory usage if a client occasionally allocates a large block of state. This commit is inspired by some similar work done by Juan A. Suarez Romero <jasuarez@igalia.com>. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	8c079b566e	anv/allocator: Support pushing multiple blocks onto a free list at once Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	8769fb48fb	anv/allocator: Add helpers for dealing with bucket sizes Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	12043ca696	anv/allocator: Add the capability to allocate blocks of different sizes Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	01170df262	anv/allocator: Rework a comment This commit just fixes up the English a bit and re-flows the comment. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	bcc5d0defb	anv/allocator: Tweak the block pool growing algorithm The old algorithm worked fine assuming a constant block size. We're about to break that assumption so we need an algorithm that's a bit more robust against suddenly growing by a huge amount compared to the currently allocated quantity of memory. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	d3ed72e2c2	anv/allocator: Embed the block_pool in the state_pool Now that the state stream is allocating off of the state pool, there's no reason why we need the block pool to be separate. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	bb2a3f0df8	anv/allocator: Get rid of the ability to free blocks Now that everything is going through the state pools, the block pool no longer needs to be able to handle re-use. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	08413a81b9	anv: Allocate binding table blocks through the state pool Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	55f49e6b7e	anv/allocator: Add support for "back" allocations to state_pool Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	49ecaf88d1	anv/allocator: Drop the block_size field from block_pool Since the state_stream is now pulling from a state_pool, the only thing pulling directly off the block pool is the state pool so we can just move the block_size there. The one exception is when we allocate binding tables but we can just reference the state pool there as well. The only functional change here is that we no longer grow the block pool immediately upon creation so no BO gets allocated until our first state allocation. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	30d63ffe26	anv/allocator: Pull the userptr part of block_pool_grow into a helper Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	c73ce41a48	anv/allocator: Roll fixed_size_state_pool into state_pool The helper functions aren't really gaining us as much as they claim and are actually about to be in the way. Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	6d02ef011e	anv/allocator: Remove the state_size field from fixed_size_state_pool Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	367031a5c8	anv: Get rid of a bunch of uses of size_t We should only use size_t when referring to sizes of bits of CPU memory. Anything on the GPU or just a regular array length should be a type that has the same size on both 32 and 64-bit architectures. For state objects, we use a uint32_t because we'll never allocate a piece of driver-internal GPU state larger than 2GB (more like 16KB). Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	e86aeecb6a	anv/allocator: Convert the state stream to pull from a state pool Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	e049dea5b2	anv/allocator: Return a null state for zero-size allocations Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Jason Ekstrand	45e1829274	anv/allocator: Add no-valgrind versions of state_pool_alloc/free Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-05-04 19:07:54 -07:00
Kenneth Graunke	9377801fbd	anv: Simplify Cherryview line handling. We can just use the new CHVLineWidth field rather than an entirely different generation's packing function. v2: Inline the function (requested by Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-04 16:17:34 -07:00
Kenneth Graunke	31f094e691	i965: Fix line width on Cherryview. We just add another field to gen8.xml for the Cherryview line width, rather than trying to replicate the gymnastics done in the Vulkan driver to use gen9 SF pack functions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-05-04 16:17:34 -07:00
Emil Velikov	9d2aa6e506	anv: fix anv_gem_mmap comment to not mention NULL The function cannot return NULL, update the comment accordingly. Fixes: `b546c9d` ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-05-04 18:06:18 +01:00
Samuel Iglesias Gonsálvez	939b015736	anv: vkBindImageMemory() should return VK_ERROR_OUT_OF_{HOST,DEVICE}_MEMORY on failure According to the spec we get VK_ERROR_OUT_OF_HOST_MEMORY or VK_ERROR_OUT_OF_DEVICE_MEMORY on vkBindImageMemory failure. Fixes returned value changed by `b546c9d`. Fixes: `b546c9d` ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error") Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-05-04 15:13:08 +02:00
Samuel Iglesias Gonsálvez	b546c9d318	anv: anv_gem_mmap() returns MAP_FAILED as mapping error Take it into account when checking if the mapping failed. v2: - Remove map == NULL and its related comment (Emil) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Fixes: `6f3e3c715a` ("vk/allocator: Add a BO pool") Fixes: `9919a2d34d` ("anv/image: Memset hiz surfaces to 0 when binding memory") Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>	2017-05-04 08:56:36 +02:00
Rafael Antognolli	f321f695d3	genxml: Fix 3DSTATE_DEPTH_BUFFER length on gen5. The hardware docs are wrong, but the length used in the xml is also wrong. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 18:57:51 -07:00
Rafael Antognolli	2e5d65ccb6	anv: Use BRW_BARYCENTRIC_NONPERSPECTIVE_BITS from common header. In a previous patch some enums were split out from brw_eu_defines.h, so they could be used by genxml based code. anv can also benefit from this. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:58:55 -07:00
Rafael Antognolli	8fa8abef4b	i965: Move enums to brw_compiler.h. These enums live inside struct brw_wm_prog_data, so it makes sense to keep them in the same header. It also allows to use them without including brw_eu_defines.h. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:55:58 -07:00
Rafael Antognolli	a66743ce8d	genxml: Update 3DSTATE_LINE_STIPPLE xml on gen6. From the PRM, Line Stipple Inverse Repeat Count is on dw2, bits 31:16, format U1.13. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:14 -07:00
Rafael Antognolli	7d5cc5b954	genxml: Normalize xml for 3DSTATE_CC_STATE_POINTERS. - "COLOR_CALC_STATE Change" -> "Color Calc State Pointer Valid" - "Pointer to COLOR_CALC_STATE" -> "Color Calc State Pointer" - "BackFace" -> "Backface" Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:07 -07:00
Rafael Antognolli	b89805a7bc	genxml: Normalize xml for 3DSTATE_MULTISAMPLE. Name the options to "Pixel Location": - PIXLOC_CENTER -> CENTER - PIXLOC_UL_CORNER -> UL_CORNER Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:07 -07:00
Rafael Antognolli	c032cae9ff	genxml: Rename "Function Enable" to "Enable". Rename that field name on genxml for: - 3DSTATE_GS - gen6+ - 3DSTATE_DS - gen7+ - 3DSTATE_HS - gen7+ Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:07 -07:00
Rafael Antognolli	5b4223dc8e	genxml: Clip guardbands are float, not int. This makes genxml create the right struct types, and generate the right batch commands. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:07 -07:00
Rafael Antognolli	4266c372d9	genxml: 3DSTATE_VS rename Function Enable to Enable. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:07 -07:00
Kenneth Graunke	da299b7df3	genxml: Make "Reorder Mode" fields consistent. Both GS and SOL have these fields. Some were ReorderEnable = true, some were ReorderMode = REORDER_TRAILING, and some were just TRAILING. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-05-03 16:41:07 -07:00
Rafael Antognolli	872ffb2221	genxml: Add alias for MOCS. Use an alias, so we can set the same value as the #define's. v3: - Call it "SO Buffer MOCS" to follow the most common naming scheme. - Add alias for gen7 and gen75 too (Ken). Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:02 -07:00
Rafael Antognolli	b5e652fc83	genxml: Add missing field values to 3DSTATE_SBE. Fill out "Attribute Active Component Format" possible values. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:02 -07:00
Rafael Antognolli	273a10b3f1	genxml: Update xml for 3DSTATE_SF. - Normalize "Anti-Aliasing Enable" - Add "Multisample Rasterization Mode" constants - Rename "Use Point Width on Vertex" to "Vertex" - Rename "Use Point Width from State" to "State" Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:02 -07:00
Rafael Antognolli	3f155ab290	genxml: Rename clip enable property. There are two variants: - Clip Enable - CLIP Enable (on gen6) Rename everything to Clip Enable. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:41:02 -07:00
Louis-Francis Ratté-Boulianne	e0aa2bd9cb	genxml: Fill out Gen4, Gen45 and Gen5 XML Add some more details to Gen4 and Gen45 and add what is needed in Gen5 XML. This commit overwrite the previous work done on Gen4 and Gen45 as it contains more instructions and fixes some mistakes. However, comments (dword boundaries) are lost in the process. v3: - Set the type of some fields, instead of prefix. Also fix the SAMPLER_BORDER_COLOR_STATE fields of gen5.xml. Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-03 16:40:52 -07:00
Jason Ekstrand	4201cc2dd3	anv: Implement VK_KHX_external_semaphore_fd This implementation allocates a 4k BO for each semaphore that can be exported using OPAQUE_FD and uses the kernel's already-existing synchronization mechanism on BOs. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-03 15:09:46 -07:00
Jason Ekstrand	ef2e427d78	anv: Pull the guts of cmd_buffer_execbuf into a helper Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-03 15:09:46 -07:00
Jason Ekstrand	975c0f339f	anv: Implement VK_KHX_external_semaphore Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-03 15:09:46 -07:00
Jason Ekstrand	298e054d0c	anv: Implement VK_KHX_external_semaphore_capabilities This just stubs things out. Real external semaphore support will come with VK_KHX_external_semaphore_fd. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-03 15:09:46 -07:00
Jason Ekstrand	65aa89e75f	anv: Add a real semaphore struct It's just a dummy for now, but we'll flesh it out as needed for external semaphores. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-05-03 15:09:46 -07:00
Jason Ekstrand	f8d7c23e1f	anv: Trivially implement multiDrawIndirect Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	272b7e7d25	anv: Enable VK_KHX_multiview and SPV_KHR_multiview Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	3dbd7737d4	anv/cmd_buffer: Emit instanced draws for multiple views Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	32abb0e13c	anv/cmd_buffer: Pull indirect draw parameter loading into a helper Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	0db7070330	anv/pipeline: Add shader lowering for multiview v2 (Jason Ekstrand): - Take a view_mask rather than a whole subpass - Build the view mask into the VS shader key Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	ca5bdfdfc6	anv/pipeline: Add a subpass field to anv_pipeline This simplifies the code a variety of places. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	c4549e05aa	anv/pipeline: Call nir_gather_info later We want to insert more lowering code that may insert system values and we need to gather info after that lowering. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	dcb6a68bb4	anv: Move shader hashing to anv_pipeline Shader hashing is very closely related to shader compilation. Putting them right next to each other in anv_pipeline makes it easier to verify that we're actually hashing everything we need to be hashing. The only real change (other than the order of hashing) is that we now hash in the shader stage. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	d6b8106eea	anv/pass: Store the per-subpass view mask Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	e997f548de	anv: Add the KHX_multiview boilerplate Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Jason Ekstrand	0bed97006f	anv/nir: Delete the apply_dynamic_offsets prototype That pass hasn't existed since `dd4db84640` but the prototype stuck around for no reason. Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-05-03 11:25:46 -07:00
Samuel Iglesias Gonsálvez	f57e234fdd	i965/vec4: don't modify regioning parameters to the sources of DF align1 instructions The regioning parameters are now properly set by convert_to_hw_regs() and we don't need to fix them in the generator. That latter fix previously done in the generator was strictly speaking wrong for any non-identity regions. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez	aaeb1c99be	i965/vec4: fix register width for DF VGRF and UNIFORM On gen7, the swizzles used in DF align16 instructions works for element size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that in the rest of the code and prepare the instructions for this (scalarize_df()), we need to set it to two again. However, for DF align1 instructions, a width of 2 is wrong as we are not reading the data we want. For example, an uniform would have a region of <0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access to the first 4. This patch sets the default one to 4 and then modifies the width of align16 instruction's DF sources when we translate the logical swizzle to the physical one. v2: - Remove conditional (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez	7f728bce81	i965/vec4: fix vertical stride to avoid breaking region parameter rule From IVB PRM, vol4, part3, "General Restrictions on Regioning Parameters": "If ExecSize = Width and HorzStride ≠ 0, VertStride must be set to Width * HorzStride." In next patch, we are going to modify the region parameter for uniforms and vgrf. For uniforms that are the source of DF align1 instructions, they will have <0, 4, 1> regioning and the execsize for those instructions will be 4, so they will break the regioning rule. This will be the same for VGRF sources where we use the vstride == 0 exploit. As we know we are not going to cross the GRF boundary with that execsize and parameters (not even with the exploit), we just fix the vstride here. v2: - Move is_align1_df() (Curro) - Refactor exec_size == width calculation (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-03 15:32:39 +02:00
Jason Ekstrand	6ef1bd4fa5	anv/tests: Create a dummy instance as well as device This fixes crashes caused by `35e626bd0e` which made us start referencing the instance in the allocators. With this commit, the tests now happily pass again. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100877 Tested-by: Vinson Lee <vlee@freedesktop.org>	2017-05-01 17:06:40 -07:00
Chad Versace	85ca563b58	anv: Drop 'x11' prefix from non-X11 WSI funcs Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The functions are used by Wayland WSI too. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2017-04-28 08:54:45 -07:00
Jason Ekstrand	ebd1bd6998	anv: Alphabetize KHR extensions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2017-04-28 07:41:03 -07:00
Jason Ekstrand	032861693e	anv: Move queues, events, and semaphores to their own file Things are about to get more complicated, especially as far as semaphores are concerned. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	9bd1f03487	anv: Implement VK_KHX_external_memory_fd This commit just exposes the memory handle type. There's interesting we need to do here for images. So long as the user doesn't set any crazy environment variables such as INTEL_DEBUG=nohiz, all of the compression formats etc. should "just work" at least for opaque handle types. v2 (chadv): - Rebase. - Fix vkGetPhysicalDeviceImageFormatProperties2KHR when handleType == 0. - Move handleType-independency comments out of handleType-switch, in vkGetPhysicalDeviceExternalBufferPropertiesKHX. Reduces diff in future dma_buf patches. Co-authored-with: Chad Versace <chadversary@chromium.org> Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	818b857914	anv: Use the BO cache for DeviceMemory allocations Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	494d6f65a7	anv/allocator: Add a BO cache This cache allows us to easily ensure that we have a unique anv_bo for each gem handle. We'll need this in order to support multiple-import of memory objects and semaphores. v2 (Jason Ekstrand): - Reject BO imports if the size doesn't match the prime fd size as reported by lseek(). Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	5d25ac6a4b	anv: Implement VK_KHX_external_memory This is the trivial implementation that just exposes the extension string but exposes zero external handle types. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Chad Versace	354ca7a1d4	anv: Implement VK_KHX_external_memory_capabilities This is a complete but trivial implementation. It's trivial becasue We support no external memory capabilities yet. Most of the real work in this commit is in reworking the UUIDs advertised by the driver. v2 (chadv): - Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR. Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of input structs, not the chain of output structs. - In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the input chain and the output chain separately. Reduces diff in future dma_buf patches. Co-authored-with: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	d4d9258b61	anv/physical_device: Rename uuid to pipeline_cache_uuid We're about to have more UUIDs for different things so this one really needs to be properly labeled. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	02767cb4ff	anv: Refactor device_get_cache_uuid into physical_device_init_uuids Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	35e626bd0e	anv: Set EXEC_OBJECT_ASYNC when available Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	bd3a9813b9	anv/cmd_buffer: Use the device allocator for QueueSubmit The command is really operating on a Queue not a command buffer and the nearest object to that with an allocator is VkDevice. Reviewed-by: Chad Versace <chadversary@chromium.org> Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>	2017-04-27 20:08:46 -07:00
Jason Ekstrand	c43b4bc85e	anv: Don't place scratch buffers above the 32-bit boundary This fixes rendering corruptions in DOOM. Hopefully, it will also make Jenkins a bit more stable as we've been seeing some random failures and GPU hangs ever since turning on 48bit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620 Fixes: `651ec926fc` "anv: Add support for 48-bit addresses" Tested-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "17.1" <mesa-stable@lists.freedesktop.org>	2017-04-27 02:04:57 -07:00
Rafael Antognolli	6a40ccec4b	genxml: Fix gen_pack_header.py crash when field type is invalid. Just return earlier in that case. Also set prefix to an empty string, so we don't get to use it undefined. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:14:12 -07:00
Rafael Antognolli	9670124e31	genxml: Make BLEND_STATE command support variable length array. We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers dwords (on gen8+), but the BLEND_STATE struct length is always 17. By marking it size 1, which is actually the size of the struct minus the BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of entries. For gen6 and gen7 we set length to 0, since it only contains BLEND_STATE_ENTRY's, and no other data. With this change, we also change the code for blorp and anv to emit only the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on gen6-7 and 17 dwords on gen8+. v2: - Use designated initializers on blorp and remove 0 from initialization (Jason) - Default entries to disabled on Vulkan (Jason) - Rebase code. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:14:10 -07:00
Rafael Antognolli	4ace73b1f6	genxml: Fix python crash when no dwords are found. If the 'dwords' dict is empty, max(dwords.keys()) throws an exception. This case could happen when we have an instruction that is only an array of other structs, with variable length. v2: - Add another clause for empty dwords and make it work with python 3 (Dylan) - Set the length to 0 if dwords is empty, and do not declare dw Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:14:08 -07:00
Rafael Antognolli	19720405d5	genxml: Remove unused parameter. 'start' parameter from Group.emit_pack_function() is useless. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:14:05 -07:00
Rafael Antognolli	1ea41163eb	intel/aubinator: Correctly read variable length structs. Before this commit, when a group with count="0" is found, only one field is added to the struct representing the instruction. This causes only one entry to be printed by aubinator, for variable length groups. With this commit we "detect" that there's a variable length group (count="0") and store the offset of the last entry added to the struct when reading the xml. When finally reading the aubdump file, we check the size of the group and whether we have variable number of elements, and in that case, reuse the last field to add the remaining elements. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Tested-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 15:13:51 -07:00
Nanley Chery	50134cede1	isl/format: Update the R16G16B16X16_FLOAT entry The section of the PRM mentioned in the code comment above this table says that this format supports the render target write message. Internal documentation says that this format also supports alpha blending. As a side effect, this allows CCS_D buffers to be created for images with this format. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-24 13:30:50 -07:00
Nanley Chery	b1066f7365	anv/pass: Delete anv_pass::subpass_attachments This field has no users. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-24 13:30:50 -07:00
Francisco Jerez	58324389be	intel/fs: Take into account amount of data read in spilling cost heuristic. Until now the spilling cost calculation was neglecting the amount of data read from the register during the spilling cost calculation. This caused it to make suboptimal decisions in some cases leading to higher memory bandwidth usage than necessary. Improves Unigine Heaven performance by ~4% on BDW, reversing an unintended FPS regression from my previous commit `147e71242c` with n=12 and statistical significance 5%. In addition SynMark2 OglCSDof performance is improved by an additional ~5% on SKL, and a Kerbal Space Program apitrace around the Moho planet I can provide on request improves by ~20%. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-24 11:01:40 -07:00
Francisco Jerez	ecc19e12dc	intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy. This is what we use later on to compute the number of registers that will actually get spilled to memory, so it's more likely to match reality than the current open-coded approximation. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-24 10:59:56 -07:00
Kenneth Graunke	6b10c37b9c	i965/vec4: Use reads_accumulator_implicitly(), not MACH checks. Curro pointed out that I should not just check for MACH, but use the reads_accumulator_implicitly() helper, which would also prevent the same bug with MAC and SADA2 (if we ever decide to use them). Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-24 10:53:49 -07:00
Timothy Arceri	7a7ee40c2d	nir/i965: add before ffma algebraic opts This shuffles constants down in the reverse of what the previous patch does and applies some simpilifications that may be made possible from doing so. Shader-db results BDW: total instructions in shared programs: 12980814 -> 12977822 (-0.02%) instructions in affected programs: 281889 -> 278897 (-1.06%) helped: 1231 HURT: 128 total cycles in shared programs: 246562852 -> 246567288 (0.00%) cycles in affected programs: 11271524 -> 11275960 (0.04%) helped: 1630 HURT: 1378 V2: mark float opts as inexact Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 12:08:14 +10:00
Kenneth Graunke	2faf227ec2	i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce(). opt_register_coalesce() was optimizing sequences such as: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D mov(8) m4.zw:F, vgrf5.xxxy:F into: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D This doesn't work - if we're going to reswizzle MACH, we'd need to reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw components with attr18.yy * attr19.yy. But the MACH instruction expects .z to contain attr18.x * attr19.x. Bogus results ensue. No change in shader-db on Haswell. Prevents regressions in Timothy's patches to use enhanced layouts for varying packing (which rearrange code just enough to trigger this pre-existing bug, but were fine themselves). Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-22 00:01:16 -07:00
Jason Ekstrand	1e21d4227e	anv/query: Use genxml for MI_MATH Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed by: Iago Toral Quiroga <itoral@igalia.com>	2017-04-20 15:24:06 -07:00
Jason Ekstrand	e23129ac0c	genxml: Add better support for MI_MATH This breaks the guts of MI_MATH (the instruction part) out into its own structure with proper named values. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed by: Iago Toral Quiroga <itoral@igalia.com>	2017-04-20 15:24:06 -07:00
Jason Ekstrand	b7a2af8e38	genxml/pack: Allow hex values in the XML Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-04-20 15:24:06 -07:00
Nanley Chery	d9d793696b	anv/cmd_buffer: Disable CCS on BDW input attachments The description under RENDER_SURFACE_STATE::RedClearColor says, For Sampling Engine Multisampled Surfaces and Render Targets: Specifies the clear value for the red channel. For Other Surfaces: This field is ignored. This means that the sampler on BDW doesn't support CCS. Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-17 16:47:38 -07:00
Lionel Landwerlin	d71efbe5f2	anv: blorp: flush memory after copy Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-17 14:45:57 -07:00
Kenneth Graunke	9b71709cb8	intel/decoder: Fix is_header_field starting condition. Starting positions >= 32 are not part of the header, rather than >. Caught by Coverity, which found that "bits <<= field->start" may shift by 32, which has undefined behavior. CID: 1404968 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-04-16 22:58:23 -07:00
Jason Ekstrand	d2d6cf6c83	anv: Add the pci_id into the shader cache UUID This prevents a user from using a cache created on one hardware generation on a different one. Of course, with Intel hardware, this requires moving their drive from one machine to another but it's still possible and we should prevent it. Reviewed-by: Chad Versace <chadversary@chromium.org> Cc: mesa-stable@lists.freedesktop.org	2017-04-14 17:41:07 -07:00
Matt Turner	2eeb1b0ad9	i965: Use correct VertStride on align16 instructions. In commit `c35fa7a`, we changed the "width" of DF source registers to 2, which is conceptually fine. Unfortunately a VertStride of 2 is not allowed by align16 instructions on IVB/BYT, and the regular VertStride of 4 works fine in any case. See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test for example: cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed v2: - Add spec quote (Curro). - Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro) Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	d8441e2276	i965/vec4/dce: improve track of partial flag register writes This is required for correctness in presence of multiple 4-wide flag writes (e.g. 4-wide instructions with a conditional mod set) which update a different portion of the same 8-bit flag subregister. Right now we keep track of flag dataflow with 8-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 8 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	c1fc8fad47	i965/vec4: don't do horizontal stride on some register file types horiz_offset() shouldn't be doing anything for scalar registers, because all channels of any SIMD instructions will end up reading or writing the same component of the register, so shifting the register offset would be wrong. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Re-implement in terms of is_uniform() for simplicity. Pass argument by const reference. Clarify commit message. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Matt Turner	21e8e3a848	i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT. Otherwise for a pack_double_2x32_split opcode, we emit: vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted }; mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted }; mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same The intention was to emit mov(4)s for the instructions that have ERROR annotations. See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test for example. v2 (Samuel): - Instead of setting the exec size to a fixed value, don't double it (Curro). - Add PICK_{HIGH,LOW}_32BIT to the condition. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Trivial rebase changes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	f030aaf2fb	i965/vec4: use vec4_builder to emit instructions in setup_imm_df() Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to static stand-alone function. Don't write unused components in the result. Use vec4_builder interface for register allocation. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Juan A. Suarez Romero	a907c91e93	i965/vec4: consider subregister offset in live variables Take into account offset values less than a full register (32 bytes) when getting the var from register. This is required when dealing with an operation that writes half of the register (like one d2x in IVB/BYT, which uses exec_size == 4). v2: - Take in account this offset < 32 in liveness analysis too (Curro) v3: - Change formula in var_from_reg() (Curro) - Remove useless changes (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	92649a3e67	i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB On IVB, DF instructions have lowered the SIMD width to 4 but the exec_size will be later doubled. Fix the assert to avoid crashing in this case. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4 == 0' part the assertion was redundant with the previous assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	6e3265eae5	i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type This way we can set the destination type as double to all these new opcodes, avoiding any optimizer's confusion that was happening before. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop no_spill workaround originally needed due to the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	50a5217637	i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones When doing a 64-bit to a smaller data type size conversion, the destination should be aligned to 64-bits. Because of that, we need to gather the data after the actual conversion. Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but now we split them explicitely in two different instructions: VEC4_OPCODE_FROM_DOUBLE just do the conversion and VEC4_OPCODE_PICK_LOW_32BIT will gather the data. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	cfaf14a126	i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT In the generator we must generate slightly different code for Ivybridge/Baytrail, because of the way the stride works in this hardware. v2: - Use stride and don't need to fix dst (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	be445d3ea3	i965/vec4: keep original type when dealing with null registers Keep the original type when dealing with null registers. Especially because we do no want to introduce an implicit conversion between types that could affect the conditional flags. This affects especially when the original type is DF, and we are working on Ivybridge/Baytrail. v2 (Curro) - Fix typo. - Use retype() instead of applying the type directly. - Remove unneeded retype. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a21dc2b500	i965/vec4: split DF instructions and later double its execsize in IVB/BYT We need to split DF instructions in two on IVB/BYT as it needs an execsize 8 to process 4 DF values (one GRF in total). v2: - Rename helper and make it static inline function (Matt). - Fix indention and add braces (Matt). v3: - Don't edit IR instruction when doubling exec_size (Curro) - Add comment into the code (Curro). - Manage ARF registers like the others (Curro) v4: - Add get_exec_type() function and use it to calculate the execution size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Clarify comment in get_lowered_simd_width(). Move SIMD width workaround outside of 'if (...inst->size_written > REG_SIZE)' conditional block, since the problem should be independent of whether the amount of data written by the instruction is greater or lower than a GRF. Drop redundant is_ivb_df definition. Drop bogus inst->exec_size < 8 check. Simplify channel group assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a5399e8b1c	i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT The hardware applies the same channel enable signals to both halves of the compressed instruction which will be just wrong under non-uniform control flow. Fix this by splitting those instructions to SIMD4. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	ebfb703d44	i965/fs: Get 64-bit indirect moves working on IVB. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Matt Turner	630b84cdc8	i965: Use source region <1,2,0> when converting to DF. Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead of two. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	3198ce3f96	i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT According to the IVB and HSW PRMs: "2.When the destination requires two registers and the sources are indirect, the sources must use 1x1 regioning mode." So for DF instructions the execution size is not limited by the number of address registers that are available, but by the EU decompression logic not handling VxH indirect addressing correctly. This patch limits the SIMD width to 4 in this case. v2: - Fix typo (Matt). - Fix condition (Curro) v3: - Add spec quote (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	571cbd05eb	i965/fs: fix dst stride in IVB/BYT type conversions When converting a DF to 32-bit conversions, we set dst stride to 2, to fulfill alignment restrictions because the upper Dword of every Qword will be written with undefined value. But in IVB/BYT, this is not necessary, as each DF conversion already writes 2, the first one the real value, and the second one a 0. That is, IVB/BYT already set stride = 2 implicitly, so we must set it to 1 explicitly to avoid ending up with stride = 4. v2: - Fix typo (Matt) v3: - Fix stride in the destination's brw_reg, don't modity IR (Curro) v4: - Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro) - Fix comment (Curro). - Relax hstride assert (Curro) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Minor spelling fixes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	af6fc3a8ea	i965/fs: rename lower_d2x to lower_conversions v2: - Change the name to lower_conversions. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	dee31311eb	Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs." This reverts commit `7dccd38b40`. d2x pass fixes SEL instructions when there is a type conversion by doing a SEL without type conversion and then convert the result. This pass also takes into account the non-uniform control flow. Then, `7dccd38b40` is not needed anymore. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	aeecc82d05	i965/fs: generalize the legalization d2x pass Generalize it to lower any unsupported narrower conversion. v2 (Curro): - Add supports_type_conversion() - Reuse existing intruction instead of cloning it. - Generalize d2x to narrower and equal size conversions. v3 (Curro): - Make supports_type_conversion() const and improve it. - Use foreach_block_and_inst to process added instructions. - Simplify code. - Add assert and improve comments. - Remove redundant mov. - Remove useless comment. - Remove saturate == false assert and add support for saturation when fixing the conversion. - Add get_exec_type() function. v4 (Curro): - Use get_exec_type() function to get sources' type. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Matt Turner	94ffeb7fa2	i965: Use <0,2,1> region for scalar DF sources on IVB/BYT. On HSW+, scalar DF sources can be accessed using the normal <0,1,0> region, but on IVB and BYT DF regions must be programmed in terms of floats. A <0,2,1> region accomplishes this. v2: - Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro). v3: - Added comment explaining the reason (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	82d17615f4	i965/fs: clamp exec_size when an instruction has a scalar DF source Then the SIMD lowering pass will get rid of any compressed instructions with scalar source (whether force_writemask_all or not) and we avoid hitting the Gen7 region decompression bug. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Suggested-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	0f1316d4db	i965/fs: double regioning parameters and execsize for DF in IVB/BYT In IVB and BYT, both regioning parameters and execution sizes are measured as 32-bits element size. So when we have something like: mov(8) g2<1>DF g3<4,4,1>DF We are not actually moving 8 doubles (our intention), but 4 doubles. We need to double the parameters to cope with this issue. However, horizontal strides don't behave as they're supposed to on IVB for DF regions, they will cause each 32-bit half of DF sources to be strided individually, and doubling the value won't make any difference. v2: - Use devinfo directly (Matt). - Use Baytrail instead of Valleview (Matt). - Use IvyBridge instead of Ivy (Matt) - Double the exec_size in code emission (Curro) v3: - Change hstride doubling by an assert and fix commit log (Curro). - Substitute remaining compiler->devinfo by devinfo (Curro). v4: - Fix comment (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	79af256388	i965/fs: add helper to retrieve instruction execution type The execution data size is the biggest type size of any instruction operand. We will use it to know if the instruction deals with DF, because in Ivy we need to double the execution size and regioning parameters. v2: - Fix typo in commit log (Matt) - Use static inline function instead of fs_inst's method (Curro). - Define the result as a constant (Curro). - Fix indentation (Matt). - Add braces to nested control flow (Matt). v3 (Curro): - Add get_exec_type() and other auxiliary functions and use them to calculate its size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced execution type for integer vector types. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Move into brw_ir_fs.h header for consistency with the VEC4 back-end. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Matt Turner	fd349d29e4	i965: Handle IVB DF differences in the validator. On IVB/BYT, region parameters and execution size for DF are in terms of 32-bit elements, so they are doubled. For evaluating the validity of an instruction, we halve them. v2 (Sam): - Add comments. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:07 -07:00
Iago Toral Quiroga	fbac8b1f94	i965/disasm: also print nibctrl in IVB for execsize=8 4-wide DF operations where NibCtrl applies require and execsize of 8 in IvyBridge/BayTrail. v2: - Refactor NibCtrl printing (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:06 -07:00
Jason Ekstrand	220974b38d	anv/blorp: Properly handle VK_ATTACHMENT_UNUSED The Vulkan driver was originally written under the assumption that VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments. However, the way things fell together, VK_ATTACHMENT_UNUSED can be used anywhere in the subpass description. The blorp-based clear and resolve code has a bunch of places where we walk lists of attachments and we weren't handling VK_ATTACHMENT_UNUSED everywhere. This commit should fix all of them. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: <mesa-stable@lists.freedesktop.org>	2017-04-14 14:20:42 -07:00
Jason Ekstrand	21d2ca72d8	anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: <mesa-stable@lists.freedesktop.org>	2017-04-14 14:20:42 -07:00
Jason Ekstrand	02eca8b6f8	anv/cmd_buffer: Always set up a null surface state We're about to start requiring it in yet another case and calculating exactly when one is needed is starting to get prohibitively expensive. A single surface state doesn't take up that much space so we may as well create one all the time. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: <mesa-stable@lists.freedesktop.org>	2017-04-14 14:20:42 -07:00
Jason Ekstrand	e1f6fb8021	anv/cmd_buffer: Flush the VF cache at the top of all primaries Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-14 13:35:02 -07:00
Jason Ekstrand	939337e49f	anv/blorp: Flush the texture cache in UpdateBuffer Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-14 13:35:02 -07:00
Jason Ekstrand	475bab0330	anv: Limit VkDeviceMemory objects to 2GB Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-04-14 13:35:02 -07:00
Jason Ekstrand	4495b917e2	intel/blorp: Add a blorp_emit_dynamic macro This makes it much easier to throw together a bit of dynamic state. It also automatically handles flushing so you don't accidentally forget. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2017-04-14 13:35:02 -07:00
Matt Turner	ab18578b03	anv: Only define wsi_cbs when VK_USE_PLATFORM_WAYLAND_KHR defined	2017-04-12 11:00:39 -07:00
Francisco Jerez	147e71242c	i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic. The individual branches of an if/else/endif construct will be executed some unknown number of times between 0 and 1 relative to the parent block. Use some factor in between as weight while approximating the cost of spill/fill instructions within a conditional if-else branch. This favors spilling registers used within conditional branches which are likely to be executed less frequently than registers used at the top level. Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on my SKL GT4e. Should have a comparable effect on other platforms. No significant regressions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-11 15:28:54 -07:00
Juan A. Suarez Romero	8d7a82ae32	anv: remove needless VALGRIND_MAKE_MEM_DEFINED This is already invoked in the following VG_NOACCESS_READ() call. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-11 17:21:57 +02:00
Jason Ekstrand	da2ac19511	intel/blorp: Use ISL for emitting depth/stencil/hiz Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-10 07:57:21 -07:00
Jason Ekstrand	d3785dcb2f	intel/blorp: Emit 3DSTATE_STENCIL_BUFFER before HIER_DEPTH We're about to replace blorp's emit code with ISL and it emits them in the other order. This makes diffing the aubs easier. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-10 07:57:21 -07:00
Jason Ekstrand	f93dc5beee	anv: Use ISL for emitting depth/stencil/hiz Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-10 07:57:21 -07:00
Jason Ekstrand	bf95f7c209	intel/isl: Add support for emitting depth/stencil/hiz Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-10 07:57:21 -07:00
Jason Ekstrand	098ca9949d	intel/isl: Use genx_bits.h instead of a hand-rolled table This gets rid of one piece of ugliness with the way ISL handles surface emitting surface states. I've never liked that hand-rolled table but it was the best we had at the time. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-07 22:34:04 -07:00
Jason Ekstrand	b85d75b3e8	intel/genxml/bits: Emit per-container _length helpers Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-07 22:34:04 -07:00
Jason Ekstrand	f97e251ab2	intel/genxml/bits: Emit per-field _start helpers Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-07 22:34:04 -07:00
Jason Ekstrand	430e697868	intel/genxml/bits: Pull the function emit code into a helper block The helper block is extremely general. It takes an string property name and an object that supports three methods: has_prop, iter_prop, and get_prop. This way we can easily generalize it to emit more different types of getter functions. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-07 22:34:04 -07:00
Jason Ekstrand	2d52e65d03	intel/genxml/bits: Refactor to add a container class Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-04-07 22:34:04 -07:00
Jason Ekstrand	bc68aa42bd	anv: Use subpass dependencies for flushes Instead of figuring it all out ourselves, just use the information given to us by the client. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-07 19:24:14 -07:00
Jason Ekstrand	e5bbf8be36	anv/pass: Record required pipe flushes Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-07 19:24:14 -07:00
Jason Ekstrand	0039d0cf27	anv/pass: Use anv_multialloc for allocating the anv_pass Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-07 19:24:14 -07:00
Jason Ekstrand	415633a722	anv/descriptor_set: Use anv_multialloc for descriptor set layouts Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-07 19:24:14 -07:00
Jason Ekstrand	e5c29b8c27	anv: Add a helper for doing mass allocations We tend to try to reduce the number of allocation calls the Vulkan driver uses by doing a single allocation whenever possible for a data structure. While this has certain downsides (usually code complexity), it does mean error handling and cleanup is much easier. This commit adds a nice little helper struct for getting rid of some of that complexity. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-07 19:24:14 -07:00
Jason Ekstrand	82695c32b6	anv: Add helpers for converting access flags to pipe bits Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-04-07 19:24:14 -07:00
Jason Ekstrand	4e17b59f6c	anv/query: Use snooping on !LLC platforms Commit `b2c97bc789` which made us start using a busy-wait for individual query results also messed up cache flushing on !LLC platforms. For one thing, I forgot the mfence after the clflush so memory access wasn't properly getting fenced. More importantly, however, was that we were clflushing the whole query range and then waiting for individual queries and then trying to read the results without clflushing again. Getting the clflushing both correct and efficient is very subtle and painful. Instead, let's side-step the problem by just snooping. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-04-07 12:17:20 -07:00
Emil Velikov	5318d1ff94	anv: provide anv_gem_busy() stub for the tests Otherwise linking way fail. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600 Fixes: `f195d40eca` ("anv/device: Add a helper for querying whether a BO is busy") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> Tested-by: Vinson Lee <vlee@freedesktop.org>	2017-04-07 19:45:58 +01:00
Samuel Iglesias Gonsálvez	1c934bc71b	anv/blorp: sample input attachments with resolves on BDW On Broadwell we still need to do a resolve between the subpass that writes and the subpass that reads when there is a self-dependency because HW could not see fast-clears and works on the render cache as if there was regular non-fast-clear surface. Fixes 16 tests on BDW: dEQP-VK.renderpass.formats..input.clear.store.self_dep Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-04-07 07:49:43 +02:00
Jordan Justen	0370350d11	intel/aubinator: Stop searching after a custom handler is found Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-04-06 13:26:08 -07:00
Jordan Justen	d5bd0e411e	intel/gen_decoder: return -1 for unknown command formats Decoding with aubinator encountered a command of 0xffffffff. With the previous code, it caused aubinator to jump 255 + 2 dwords to start decoding again. Instead we can attempt to detect the known instruction formats. If the format is not recognized, then we can advance just 1 dword. v2: * Update aubinator_error_decode * Actually convert the length variable returned into a signed integer in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-04-06 13:26:08 -07:00
Jordan Justen	7c33372f82	intel/gen_decoder: Fix length for Media State/Object commands From BDW PRM, Volume 6: Command Stream Programming, 'Render Command Header Format'. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-04-06 13:26:08 -07:00
Jordan Justen	3c77a57222	intel/aubinator_error_decode: Fix structure decode data The call to gen_print_group should provide a pointer to the beginning of the the structure data, not the start of the batch data. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-04-06 13:25:38 -07:00
Jason Ekstrand	b2c97bc789	anv/query: Busy-wait for available query entries Before, we were just looking at whether or not the user wanted us to wait and waiting on the BO. Some clients, such as the Serious engine, use a single query pool for hundreds of individual query results where the writes for those queries may be split across several command buffers. In this scenario, the individual query we're looking for may become available long before the BO is idle so waiting on the query pool BO to be finished is wasteful. This commit makes us instead busy-loop on each query until it's available. This significantly reduces pipeline bubbles and improves performance of The Talos Principle on medium settings (where the GPU isn't overloaded with drawing) by around 20% on my SkyLake gt4. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Eero Tamminen <eero.t.tamminen@intel.com> Tested-by: Grazvydas Ignotas <notasas@gmail.com>	2017-04-05 21:17:11 -07:00
Jason Ekstrand	f195d40eca	anv/device: Add a helper for querying whether a BO is busy This is a bit more efficient than using GEM_WAIT with a timeout of 0. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-04-05 21:17:11 -07:00
Emil Velikov	a6840efc09	anv: provide required gem stubs for the tests Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones. Otherwise we may get link-time errors, when building the tests. v2: Introduce all the missing stubs at once. Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Vinson Lee <vlee@freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574 Fixes: `c964f0e485` ("anv: Query the kernel for reset status") Fixes: `651ec926fc` ("anv: Add support for 48-bit addresses") Fixes: `060a6434ec` ("anv: Advertise larger heap sizes") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> --- I've intentionally kept the order the same identical to the anv_gem.c. This way we can easily grep & diff in the future ;-)	2017-04-05 17:54:38 +01:00
Emil Velikov	e664cfc5a7	intel: genxml: automake: include gen_bits_header.py in the tarball Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2017-04-05 13:16:28 +01:00
Emil Velikov	e180680980	intel: genxml: automake: polish automake rules Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2017-04-05 13:16:28 +01:00
Jason Ekstrand	060a6434ec	anv: Advertise larger heap sizes Instead of just advertising the aperture size, we do something more intelligent. On systems with a full 48-bit PPGTT, we can address 100% of the available system RAM from the GPU. In order to keep clients from burning 100% of your available RAM for graphics resources, we have a nice little heuristic (which has received exactly zero tuning) to keep things under a reasonable level of control. Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>	2017-04-04 18:33:52 -07:00
Jason Ekstrand	651ec926fc	anv: Add support for 48-bit addresses This commit adds support for using the full 48-bit address space on Broadwell and newer hardware. Thanks to certain limitations, not all objects can be placed above the 32-bit boundary. In particular, general and state base address need to live within 32 bits. (See also Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.) In order to handle this, we add a supports_48bit_address field to anv_bo and only set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set. We set the bit for all client-allocated memory objects but leave it false for driver-allocated objects. While this is more conservative than needed, all driver allocations should easily fit in the first 32 bits of address space and keeps things simple because we don't have to think about whether or not any given one of our allocation data structures will be used in a 48-bit-unsafe way. Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>	2017-04-04 18:33:52 -07:00
Jason Ekstrand	439da38d18	anv: Replace anv_bo::is_winsys_bo with a uint32_t flags Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>	2017-04-04 18:33:52 -07:00
Jason Ekstrand	5d1ba2cb04	anv/blorp: Align vertex buffers to 64B This fixes issues seen when adding support for full 48-bit addresses. The 48-bit addresses themselves have nothing to do with it other than that it caused the kernel to place buffers slightly differently so they interacted differently with the caches. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-04 18:33:52 -07:00
Jason Ekstrand	c964f0e485	anv: Query the kernel for reset status When a client causes a GPU hang (or experiences issues due to a hang in another client) we want to let it know as soon as possible. In particular, if it submits work with a fence and calls vkWaitForFences or vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be able to trust the results of that rendering. In order to provide this guarantee, we have to ask the kernel for context status in a few key locations. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-04 18:33:52 -07:00
Jason Ekstrand	82573d0f75	anv: Check for device loss at the end of WaitForFences It's possible that the device could have been lost while we were waiting. We should let the user know if this has happened. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-04 18:33:51 -07:00
Jason Ekstrand	c6f69eea6a	anv/pipeline: Properly handle unset gl_Layer and gl_ViewportIndex When the shader does not set one of these values, they are supposed to get a default value of 0. We have hardware bits in 3DSTATE_CLIP for this but haven't been setting them. This fixes the intermittent failure of dEQP-VK.geometry.layered.3d.render_to_default_layer. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-04 18:33:51 -07:00
Jason Ekstrand	3503b2714b	i965/fs: Always provide a default LOD of 0 for TXS and TXL We already provide a default LOD for textureQueryLevels and texture() on non-fragment stages. However, there are more cases where one is needed such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list out all of the cases one at a time, just provide the default for all TXS and TXL operations. This fixes a shader validation error in the new Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391 Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-04 18:33:35 -07:00
Jason Ekstrand	1fde054b8f	intel/isl: Refactor and clerify gen8 alignment calculations Adding the actual table from the docs makes it clearer exactly what the restrictions are. In particular, it becomes clear that compressed textures ignore the alignment parameters in RENDER_SURFACE_STATE. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-04-04 14:51:57 -07:00
Lionel Landwerlin	e8d9b76f63	intel: tools: add aubinator_error_decode tool This is pretty much the same tool as what i-g-t has, only with a more fancy decoding of the instructions/registers. It also doesn't support anything before gen4. v2 (from Matt): Drop authors Remove undefined automake variable v3: Fix incorrect offsets for dword > 1 (Jordan) v4: Fix decompression error with large blobs (Jordan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>	2017-04-04 21:22:26 +01:00
Lionel Landwerlin	567d77885e	intel: genxml: add RING_BUFFER_CTL registers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-04 21:22:26 +01:00
Lionel Landwerlin	6f260ff049	intel: genxml: add FAULT_REG register Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-04 21:22:26 +01:00
Lionel Landwerlin	ca2771fa18	intel: genxml: add gen7 ERR_INT register v2: add register to gen7.5 (Matt) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-04 21:22:26 +01:00
Lionel Landwerlin	84613bf6d5	intel: genxml: add ACTHD registers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-04 21:22:26 +01:00
Lionel Landwerlin	0f195f22aa	intel: genxml: add GFX_ARB_ERROR_RPT register Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-04 21:22:26 +01:00
Lionel Landwerlin	d1a7a54d77	intel: genxml: add INSTDONE registers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-04 21:22:26 +01:00
Mauro Rossi	72175bd2a5	android: intel: genxml: fix genX_xml.h generation rules Recent changes in Makefile.sources merged the aubinator files in a unique list of generated files and genxml/genX_xml.h is now needed to avoid the following building error: ninja: error: '.../genxml/genX_xml.h', needed by '.../genxml/genX_xml.h', missing and no known rule to make it build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed Fixes: `0f83c05` "intel: genxml: compress all gen files into one" Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-04-04 09:10:46 +03:00
Jason Ekstrand	405ef7bb33	intel/vec4: Add some fall through comments Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-03 16:58:35 -07:00
Jason Ekstrand	0817110969	anv: Implement VK_KHR_incremental_present Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2017-04-03 13:51:08 -07:00
Jason Ekstrand	f82b6c6272	vulkan/wsi: Plumb present regions through the common code Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Acked-by: Dave Airlie <airlied@redhat.com>	2017-04-03 13:51:08 -07:00
Lionel Landwerlin	471c1bc7cc	aubinator/gen_decoder/i965: decode instructions from dword 0 Some packets like 3DSTATE_VF_STATISTICS, 3DSTATE_DRAWING_RECTANGLE, 3DPRIMITIVE, PIPELINE_SELECT, etc... have configurable fields in dword0, we probably want to print those. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-03 20:45:34 +01:00
Lionel Landwerlin	04f2e80257	intel: gen_decoder: store pointer to current decoded field in iterator Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-03 20:45:34 +01:00
Lionel Landwerlin	74a80d579d	intel: genxml: fix out of tree builds v2: use Emil's recommendation change rule to closer to genxml/genX_bits.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-31 15:29:57 +01:00
Tapani Pälli	3535b87a1a	anv: change BLOCK_POOL_MEMFD_SIZE to 1GB This allows us to run 32bit Vulkan apps on Android, ftruncate call would fail on 2GB (max size being 2GB - 1). Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-31 08:43:28 +03:00
Tapani Pälli	2398770c87	android: add libmesa_genxml as dep to libmesa_isl This is to fix following compile error with libmesa_isl: mesa/src/intel/isl/isl.c:28:10: fatal error: 'genxml/genX_bits.h' file not found Fixes: `f0eaf38` ("genxml: New generated header genX_bits.h (v6)") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emli Velikov <emil.velikov@collabora.com>	2017-03-31 08:42:54 +03:00
Lionel Landwerlin	469da094e1	aubinator: enable snb/ilk through --gen Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-03-31 01:25:33 +01:00
Lionel Landwerlin	0f83c05149	intel: genxml: compress all gen files into one Combining all the files into a single string didn't make any difference in the size of the aubinator binary. With this change we now also embed gen4/4.5/5 descriptions, which increases the aubinator size by ~16Kb. v2 (Lionel): rebase makefiles Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-03-31 01:24:56 +01:00
Kenneth Graunke	e113dfabad	intel: Add INTEL_CFLAGS to aubinator CFLAGS. It still needs intel_aub.h. Fixes the build.	2017-03-30 11:58:00 -07:00
Emil Velikov	3df993e1a2	intel: automake: move INTEL_CFLAGS as applicable Only common/decoder.[ch] requires it [for intel_aub.h]. v2: The code was moved to from intel/tools to intel/common, update accordingly. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-03-30 19:07:28 +01:00
Emil Velikov	4ffb394961	intel: android: remove libdrm_intel requirement The only part which requires libdrm_intel tools/aubinator is not built on Android. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-03-30 19:07:23 +01:00
Craig Stout	1da7a11de8	anv/cmd_buffer: fix host memory leak push_constants must be free'd. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100452 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>	2017-03-29 14:32:32 -07:00
Jason Ekstrand	9aba81b160	anv/batch_chain: Handle another OOM in cmd_buffer_execbuf Found by inspection while rebasing other patches. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-29 09:39:49 -07:00
Alejandro Piñeiro	2f8d6bd578	i965: expose BRW_OPCODE_[F32TO16/F16TO32] name on gen8+ Technically those hw operations are only available on gen7, as gen8+ support the conversion on the MOV. But, when using the builder to implement nir operations (example: nir_op_fquantize2f16), it is not needed to do the gen check. This check is done later, on the final emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the specific operation accordingly. So in the middle, during optimization phases those hw operations can be around for gen8+ too. Without this patch, several (at least 95) vulkan-cts quantize tests crashes when using INTEL_DEBUG=optimizer. For example: dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert v2: simplify the code using GEN_GE (Ilia Mirkin) v3: tweak brw_instruction_name instead of changing opcode_descs table, that is used for validation (Matt Turner) Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-03-29 17:34:15 +02:00
Jason Ekstrand	f3673db3d6	anv/cmd_buffer: Refactor flush_pipeline_select_* While having the _3d and _gpgpu versions is nice, there's no reason why we need to have duplicated logic for tracking the current pipeline. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-03-28 14:57:09 -07:00
Jason Ekstrand	6baae9625d	anv: Flush caches prior to PIPELINE_SELECT on all gens The programming note that says we need to do this still exists in the SkyLake PRM and, from looking at the bspec, seems like it may apply to all hardware generations SNB+. Unfortunately, this isn't particularly clear cut since there is also language in the bspec that says you can skip the flushing and stall to get better throughput. Experimentation with the "Car Chase" benchmark in GL seems to indicate that some form of flushing is still needed. This commit makes us do the full set of flushes regardless of hardware generation. We can always reduce the flushing later. Reported-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>	2017-03-28 14:57:08 -07:00
Jason Ekstrand	0fe3dcce4c	anv/cmd_buffer: Fix bad indentation A bunch of code was indented in such a way that it looked like it went with the if statement above but it definitely didn't. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>	2017-03-28 14:57:06 -07:00
Jason Ekstrand	01a65dc43b	anv/cmd_buffer: Apply flush operations prior to executing secondaries This fixes rendering issues in the Vulkan port of skia on some hardware. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-03-28 14:56:55 -07:00
Jason Ekstrand	9319ef96fd	anv/blorp: Use anv_get_layerCount everywhere Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-03-28 14:41:48 -07:00
Jason Ekstrand	1b8fa8dd79	anv: Make anv_get_layerCount a macro Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-03-28 14:41:47 -07:00
Chad Versace	d1032a047b	isl: Drop unused isl_surf_init_info::min_pitch Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-28 09:44:44 -07:00
Chad Versace	6cbc13d94c	intel: Fix requests for exact surface row pitch (v2) All callers of isl_surf_init() that set 'min_row_pitch' wanted to request an exact row pitch, as evidenced by nearby asserts, but isl lacked API for doing so. Now that isl has an API for that, update the code to use it. v2: Assert that isl_surf_init() succeeds because the callers assume it. [for jekstrand] Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v1) Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2017-03-28 09:44:44 -07:00
Chad Versace	e9017d58dc	isl: Let isl_surf_init's caller set the exact row pitch (v2) The caller does so by setting the new field isl_surf_init_info::row_pitch. v2: Validate the requested row_pitch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2017-03-28 09:44:44 -07:00
Chad Versace	23802dafc2	isl: Validate the calculated row pitch (v45) Validate that isl_surf::row_pitch fits in the below bitfields, if applicable based on isl_surf::usage. RENDER_SURFACE_STATE::SurfacePitch RENDER_SURFACE_STATE::AuxiliarySurfacePitch 3DSTATE_DEPTH_BUFFER::SurfacePitch 3DSTATE_HIER_DEPTH_BUFFER::SurfacePitch v2: -Add a Makefile dependency on generated header genX_bits.h. v3: - Test ISL_SURF_USAGE_STORAGE_BIT too. [for jekstrand] - Drop explicity dependency on generated header. [for emil] v4: - Rebase for new gen_bits_header.py script. - Replace gen_10x with gen_device_info*. v5: - Drop FINISHME for validation of GEN9 1D row pitch. [for jekstrand] - Reformat bit tests. [for jekstrand] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v4)	2017-03-28 09:44:44 -07:00
Chad Versace	f0eaf38db2	genxml: New generated header genX_bits.h (v6) genX_bits.h contains the sizes of bitfields in genxml instructions, structures, and registers. It also defines some functions to query those sizes. isl_surf_init() will use the new header to validate that requested pitches fit in their destination bitfields. What's currently in genX_bits.h: - Each CONTAINER::Field from gen.xml that has a bitsize has a macro in genX_bits.h: #define GEN{N}_CONTAINER_Field_bits {bitsize} - For each set of macros whose name, after stripping the GEN prefix, is the same, genX_bits.h contains a query function: static inline uint32_t __attribute__((pure)) CONTAINER_Field_bits(const struct gen_device_info devinfo); v2 (Chad Versace): - Parse the XML instead of scraping the generated gen*_pack.h headers. v3 (Dylan Baker): - Port to Mako. v4 (Jason Ekstrand): - Make the _bits functions take a gen_device_info. v5 (Chad Versace): - Fix autotools out-of-tree build. - Fix Android build. Tested with git://github.com/android-ia/manifest. - Fix macro names. They were all missing the "_bits" suffix. - Fix macros names more. Remove all double-underscores. - Unindent all generated code. (It was floating in a sea of whitespace). - Reformat header to appear human-written not machine-generated. - Sort gens from high to low. Newest gens should come first because, when we read code, we likely want to read the gen8/9 code and ignore the gen4 code. So put the gen4 code at the bottom. - Replace 'const' attributes with 'pure', because the functions now have a pointer parameter. - Add --cpp-guard flag. Used by Android. - Kill class FieldCollection. After Jason's rewrite, it was just a dict. v6 (Chad Versace): - Replace `key not in d.keys()` with `key not in d`. [for dylan] Co-authored-by: Dylan Baker <dylan@pnwbakers.com> Co-authored-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v5) Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v6)	2017-03-28 09:44:44 -07:00
Matt Turner	7dccd38b40	i965/fs: Don't emit SEL instructions for type-converting MOVs. SEL can only convert between a few integer types, which we basically never do. Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Acked-by: Francisco Jerez <currojerez@riseup.net>	2017-03-27 10:59:42 -07:00
Xu Randy	004468de14	anv/blorp: Fix a crash in CmdClearColorImage We should use anv_get_layerCount() to access layerCount of VkImageSub- resourceRange in anv_CmdClearColorImage and anv_CmdClearDepthStencil- Image, which handles the VK_REMAINING_ARRAY_LAYERS (~0) case. Test: Sample multithreadcmdbuf from LunarG can run without crash Signed-off-by: Xu Randy <randy.xu@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-03-27 07:43:17 -07:00
Samuel Iglesias Gonsálvez	c4c02471f4	anv: enable sampling from fast-cleared images on SKL A resolve is not needed on Skylake in this case. We were forcing a resolve because we set the input_aux_usage to ISL_AUX_USAGE_NONE. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2017-03-27 06:32:24 +02:00
Chad Versace	7414326164	genxml: Add 3DSTATE_DEPTH_BUFFER to gen5.xml isl will use this for validating the depth buffer pitch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-24 19:07:05 -07:00
Jason Ekstrand	e6621746dc	genxml: Whitespace fixes Some field names had extra spaces and some had places where we should have had a space but didn't. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-03-24 15:00:37 -07:00
Jason Ekstrand	34c3f6a27f	genxml: Replace "[N]" with "N" Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-03-24 15:00:37 -07:00
Jason Ekstrand	c2af555d6e	genxml/gen6: Remove a couple of bogus values Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-03-24 15:00:37 -07:00
Jason Ekstrand	ec27402a8f	genxml/gen8: Remove BLACK_LEVEL_CORRECTION_STATE We've never used it, it only exists on gen8, and the name of the struct contains piles of bad characters. Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-03-24 15:00:37 -07:00
Jason Ekstrand	a6df637d26	genxml: Rename two MCS fields to Auxiliary Surface on gen7 This makes gen7 more consistent with gen8+ Reviewed-by: Chad Versace <chadversary@chromium.org>	2017-03-24 15:00:37 -07:00
Chad Versace	b3f81e06d4	genxml: Fix gen_zipped_file.py dependency The gen_xml.h files depend on gen_zipped_file.py, not the gen_pack.h files. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-24 14:38:22 -07:00
Chad Versace	c7c6c53adb	genxml: Define GENXML_XML_FILES in Makefile.sources The future header genX_bits.h will depend on GENXML_XML_FILES. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-24 14:38:15 -07:00
Emil Velikov	15603055fb	anv: automake: ensure that the destination directory is created Earlier commit unintentionally dropped the mkdir, as it was rebased. Some versions of autotools will not create the output directory for generated sources. Thus the issue went unnoticed by the original author. Cc: Dylan Baker <dylan@pnwbakers.com> Cc: Steven Newbury <steve@snewbury.org.uk> Reported-by: Steven Newbury <steve@snewbury.org.uk> Fixes: Fixes: `1610b3dede` ("anv: don't pass xmlfile via stdin anv_entrypoints_gen.py") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-24 12:02:04 +00:00
Iago Toral Quiroga	129fd58131	anv/query: handle out of host memory without crashing in compute_query_result() We don't need to make the caller (CmdCopyQueryPoolResults) aware of the problem since compute_query_result() only emits state. The caller is also expected to hit OOM in this scenario right after calling this function, but it is already handling it safely. Fixes: dEQP-VK.api.out_of_host_memory.cmd_copy_query_pool_results Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-24 09:39:44 +01:00
Iago Toral Quiroga	ddb2bb3ed4	anv/pipeline: make FragCoord include sample positions when sample shading We need to know if sample shading has been requested during shader compilation since that affects the way fragment coordinates are computed. Notice that the semantics of fragment coordinates only depend on whether sample shading has been requested, not on whether more than one sample will actually be produced (that is, minSampleShading and rasterizationSamples do not affect this behavior). Because this setting affects the code we generate for the shader, we also need to include it in the WM prog key. Notice we don't need to alter the OpenGL code because it doesn't ever use this behavior, so they key's value is always false (the default). Fixes: dEQP-VK.glsl.builtin_var.fragcoord_msaa.* Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-24 08:11:53 +01:00
Iago Toral Quiroga	023ea3772d	nir/lower_wpos_center: support adding sample position to fragment coordinate According to section 14.6 of the Vulkan specification: "When sample shading is enabled, the x and y components of FragCoord reflect the location of the sample corresponding to the shader invocation." So add a boolean parameter to the lowering pass to select this behavior when we need it. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-24 08:11:53 +01:00
Iago Toral Quiroga	4da1832c00	anv: return VK_ERROR_DEVICE_LOST immeditely when device is known to be lost If we know the device has been lost we should return this error code for any command that can report it before we attempt to do anything with the device. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-24 08:11:53 +01:00
Iago Toral Quiroga	50c8d2c1f7	anv/device: keep track of 'device lost' state The Vulkan specs say: "A logical device may become lost because of hardware errors, execution timeouts, power management events and/or platform-specific events. This may cause pending and future command execution to fail and cause hardware resources to be corrupted. When this happens, certain commands will return VK_ERROR_DEVICE_LOST (see Error Codes for a list of such commands). After any such event, the logical device is considered lost. It is not possible to reset the logical device to a non-lost state, however the lost state is specific to a logical device (VkDevice), and the corresponding physical device (VkPhysicalDevice) may be otherwise unaffected. In some cases, the physical device may also be lost, and attempting to create a new logical device will fail, returning VK_ERROR_DEVICE_LOST." This means that we need to track if a logical device has been lost so we can have the commands referenced by the spec return VK_ERROR_DEVICE_LOST immediately. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-24 08:11:53 +01:00
Iago Toral Quiroga	70194c9f1a	anv/device: return VK_ERROR_DEVICE_LOST for errors during queue submissions So that we don't have to do things like rolling back address relocations in case that we ran into OOM after computing them, etc Also, make sure that if the queue submission comes with a fence, we set it up correctly so it behaves according to the spec after returning VK_ERROR_DEVICE_LOST. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-24 08:11:53 +01:00
Matt Turner	7499bc7fd7	i965: Replace OPT_V() with OPT(). We want to be able to check the progress of each pass and dump the NIR for debugging purposes if it changed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Matt Turner	1be91bd9d8	i965/fs: Return progress from demote_sample_qualifiers(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Matt Turner	fd3351246c	i965/fs: Return progress from move_interpolation_to_top(). And mark as static at the same time. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-23 14:34:44 -07:00
Tapani Pälli	4f69573178	intel: move gen_decoder.* to DECODER_FILES patch adds DECODER_FILES for libintel_common, this is so that platforms such as Android not currently using this functionality can opt out. Fixes: `7d84bb3` ("intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-23 14:05:19 +02:00
Tapani Pälli	bcae4eb502	android: fix vulkan build issues with anv_entrypoints Patch fixes entrypoint generation for libmesa_anv_entrypoints that still used old style of calling generator script. Also small fixes to libmesa_vulkan_common where there was a typo in target name (vulknan) and files were generated to wrong folder. Fixes: `8211e3e6` ("anv: Generate anv_entrypoints header and code in one command") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-23 14:04:44 +02:00
Tapani Pälli	dc9ebc6ef1	android: rename Intel Vulkan library to match desktop one Original naming was following Vulkan HAL naming scheme for no good purpose and we need same binary name for build-id code. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-23 08:19:16 +02:00
Dylan Baker	4ee675d537	anv: Remove dead prototype from entrypoints Spotted by Emil. v2: - Add this patch Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	860beb99a6	anv: use cElementTree in anv_entrypoints_gen.py It's written in C rather than pure python and is strictly faster, the only reason not to use it that it's classes cannot be subclassed. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	9050138af7	anv: don't use Element.get in anv_entrypoints_gen.py This has the potential to mask errors, since Element.get works like dict.get, returning None if the element isn't found. I think the reason that Element.get was used is that vulkan has one extension that isn't really an extension, and thus is missing the 'protect' field. This patch changes the behavior slightly by replacing get with explicit lookup in the Element.attrib dictionary, and using xpath to only iterate over extensions with a "protect" attribute. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	4d4697f868	anv: use dict.get in anv_entrypoints_gen.py Instead of using an if and a check, use dict.get, which does the same thing, but more succinctly. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	96a5f2a5ac	anv: anv_entrypoints_gen.py: use reduce function. Reduce is it's own reward. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	dd3830d11b	anv: anv-entrypoints_gen.py: rename hash to cal_hash. hash is reserved name in python, it's the interface to access an object's hash protocol. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	8211e3e60d	anv: Generate anv_entrypoints header and code in one command This produces the header and the code in one command, saving the need to call the same script twice, which parses the same XML file. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	383032c700	anv: anv_entrypoints_gen.py: directly write files instead of piping This changes the output to be written as a file rather than being piped. This had one critical advantage, it encapsulates the encoding. This prevents bugs where a symbol (generally unicode like © [copyright]) is printed and the system being built on doesn't have a unicode locale. v2: - Update Android.mk v3: - Don't generate both files at once - Fix Android.mk - drop --outdir, since the filename is passed in as an argument Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	a2a2bad2e2	anv: convert C generation to template in anv_entrypoints_gen.py This produces a file that is identical except for whitespace, there is a table that has 8 columns in the original and is easy to do with prints, but is ugly using mako, so it doesn't have columns; the data is not inherently tabular. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	0d8e22c5e4	anv: convert header generation in anv_entrypoints_gen.py to mako This produces an identical file except for whitespace. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	ba1085c694	anv: Update "do not edit" comments with proper filename This does two things, first it updates both the .h and the .c file to have the same do not edit string. Second, it uses __file__ to ensure that even if the file is moved or renamed that the name will be correct. One thing to note is the use of '{{' and '}}' in the C template. This is to instruct python to print a literal '{' and '}' respectively, rather than treating the contents as a formatter specifier. v3: - add this patch Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	ed9339bf26	anv: split main into two functions in anv_entrypoints_gen.py This is groundwork for the next patches, it will allows porting the header and the code to mako separately, and will also allow both to be run simultaneously. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	1610b3dede	anv: don't pass xmlfile via stdin anv_entrypoints_gen.py It's slow, and has the potential for encoding issues. v2: - pass xml file location via argument - update Android.mk Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	8017da8dd2	anv: make constants capitals in anv_entrypoints_gen.py Again, it's standard python style. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	08a6d3b4ba	anv: Use python style in anv_entrypoints_gen.py These are all fairly small cleanups/tweaks that don't really deserve their own patch. - Prefer comprehensions to map() and filter(), since they're faster - replace unused variables with _ - Use 4 spaces of indent - drop semicolons from the end of lines - Don't use parens around if conditions - don't put spaces around brackets - don't import modules as caps (ET -> et) - Use docstrings instead of comments v2: - Replace comprehensions with multiplication Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Dylan Baker	abd72f2e35	anv: anv_entrypoints_gen.py: use a main function This is just good practice. Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>	2017-03-22 16:22:00 -07:00
Emil Velikov	2438c0a236	intel/compiler: consistently use ifndef guards over pragma once Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Vedran Miletić <vedran@miletic.net> Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-03-22 16:55:22 +00:00
Emil Velikov	868324419e	intel/common: consistently use ifndef guards over pragma once Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Vedran Miletić <vedran@miletic.net> Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-03-22 16:55:22 +00:00
Emil Velikov	3b277bae66	i965: make brw_setup_image_uniform_values static Used only internally. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Vedran Miletić <vedran@miletic.net> Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-03-22 16:55:21 +00:00
Jason Ekstrand	7ab03ba725	anv/device: Move push descriptor query handling The query is a properties query so it needs to be handled in GetPhysicalDeviceProperties2, not GetPhysicalDeviceFeatures2. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-22 09:44:54 -07:00
Jason Ekstrand	c942faf8f3	anv/image: Return early when unbinding an image Found by inspection. Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>	2017-03-22 09:44:54 -07:00
Emil Velikov	64b9a37c3b	anv: android: remove unused include/vulkan include Spotted while skimming through similar hunks for the Autotools build. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2017-03-22 11:33:40 +00:00
Emil Velikov	1fa6a33e4d	anv: automake: use the local headers over any system provided ones At the moment, we would honour any system headers - vulkan_intel.h in particular over the ones in-tree. Thus, if one does incremental build of mesa, without the vulkan.h already installed (or at least not in the same directory as vulkan_intel.h) the build will fail. In the future we might want to upstream the vulkan_intel.h within vulkan.h or use other ways to make vulkan_intel.h obsolete. In either case, the more robust thing is to rely on our own copy. v2: Move AM_CPPFLAGS just above LIBDRM_CFLAGS (Grazvydas, Jason) Tested-by: Grazvydas Ignotas <notasas@gmail.com> Fixes: `ee8044fd` "intel/vulkan: Get rid of recursive make" Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reported-by: Grazvydas Ignotas <notasas@gmail.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-22 11:32:54 +00:00
Chad Versace	44ac618a41	isl: Refactor row pitch calculation (v2) The calculations of row_pitch, the row pitch's alignment, surface size, and base_alignment were mixed together. This patch moves the calculation of row_pitch and its alignment to occur before the calculation of surface_size and base_alignment. This simplifies a follow-on patch that adds a new member, 'row_pitch', to struct isl_surf_init_info. v2: - Also extract the row pitch alignment. - More helper functions that will later help validate the row pitch. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v2) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2017-03-21 15:56:16 -07:00
Chad Versace	c2b706f8af	isl: Drop misplaced comment about padding isl has a giant comment that explains the hardware's padding requirements. (Hint: Cache lines and page faults). But the comment is in the wrong place, in isl_calc_linear_row_pitch(), which is unrelated to padding. The important parts of that comment were copied to isl_apply_surface_padding() long ago. So drop the misplaced comment. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-21 15:56:13 -07:00
Kenneth Graunke	0c3fbf8028	i965: Drop AUB_TRACE_* stuff. This was used for aubdumping (deleted a while ago) and INTEL_DEBUG=bat decoding (deleted recently). While we're changing parameters, delete the wrapper macro and make the actual function brw_state_batch instead of __brw_state_batch. This subsumes a patch by Emil Velikov to drop this from BLORP. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-21 13:49:18 -07:00
Kenneth Graunke	7d84bb32aa	intel: Move tools/decoder.[ch] to common/gen_decoder.[ch]. This way they become part of libintel_common.la so I can use them in the i965 driver. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-21 13:49:10 -07:00
Kenneth Graunke	2b074bb7e5	intel: Add a INTEL_DEBUG=color option. This will be used for color output in debug messages. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-21 13:48:53 -07:00
Kenneth Graunke	4084083124	aubinator: Move the guts of decode_group() to decoder.c. This lets us use it outside of the aubinator binary itself. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-20 11:20:51 -07:00
Kenneth Graunke	aa1ef0b984	aubinator: Drop spec parameter to decode_group(). No longer necessary - the iterator gets it from the group. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-20 11:20:51 -07:00
Kenneth Graunke	b2c0c1d9a5	aubinator: Make the iterator store a pointer to structure descriptions. When the iterator encounters a structure field, it now looks up the gen_group for that structure definition and saves a pointer to it. This lets us drop a lot of ridiculous code in the caller, which looked at item->value (<struct NAME dword>), strtok'd the structure name back out, and looked it up itself. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-20 11:20:51 -07:00
Kenneth Graunke	a1aa78cb45	aubinator: Track the current field's starting dword offset. The iterator code already computed this value, then we stored it in the structure name, strtok'd it back out, and also manually computed it when printing dword headers. Just put the value in the struct and use it. Way simpler. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-20 11:20:51 -07:00
Kenneth Graunke	e6f7357cab	aubinator: Drop decode_structure() helper. It made more sense when decode_group() took a bunch of extra options, but now that there's only one...we may as well pass 0 and call it a day. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-20 11:20:51 -07:00
Kenneth Graunke	a8d4184b00	aubinator: Drop unused print_dword_headers flag. I added this flag in `65a9d5eabb` but it was completely unused. Both callers appear to have printed dword headers, so we can just drop the flag and continue doing it unconditionally. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-20 11:20:51 -07:00
Kenneth Graunke	7f21cb56b8	aubinator: Store a pointer from gen_group back to gen_spec. When decoding a structure field within a group, we may want to look up that structure type. Having a gen_spec pointer makes it easy to do so. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-20 11:20:51 -07:00
Kenneth Graunke	2c6c760a4b	aubinator: Store enum textual name in iter->value. gen_field_iterator_next() produces a string representing the value of the field. For enum values, it also produced a separate "description" string containing the textual name of the enum. The only caller of this function combines the two, printing enums as "<numeric value> (<texture enum name>)". We may as well just store that in item->value directly, eliminating the description field, and a layer of wrapping. v2: Use non-overlapping source and destination strings in snprintf. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-20 11:20:51 -07:00
Nanley Chery	7c50f9903f	intel: Correct the BDW surface state size The PRMs state that this packet is 16 DWORDS long. Ensure that the last three DWORDS are zeroed as required by the hardware when allocating a null surface state. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2017-03-20 09:43:44 -07:00
Xu,Randy	57595cb073	anv/genX: Solve the vkCreateGraphicsPipelines crash The crash is due to NULL pColorBlendState, which is legal if the pipeline has rasterization disabled or if the subpass of the render pass the pipeline is created against does not use any color attachments. Test: Sample subpasses from LunarG can run without crash Signed-off-by: Xu,Randy <randy.xu@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>	2017-03-20 08:31:18 +02:00
Jason Ekstrand	1d5f4f46da	genxml: Make MI_STORE_DATA_IMM have a single 64-bit data field This is way more convenient than having two separate dword fields. Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 15:31:19 -07:00
Jason Ekstrand	ced61fd53e	anv: Turn on inherited queries It all just works since it's just a hardware register so we might as well turn it on. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Ilia Mirkin	e675f57d4f	anv: Implement pipeline statistics queries In the end, pipeline statistics queries look a lot like occlusion queries only with between 1 and 11 begin/end pairs being generated instead of just the one. Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Jason Ekstrand	dda54890f3	anv: Disable VF statistics for blorp and SOL memcpy In order to get accurate statistics, we need to disable statistics for blits, clears, and the surface state memcpy at the top of each secondary command buffer. There are two possible approaches to this: 1) Disable before the blit/memcpy and re-enable afterwards 2) Move emitting 3DSTATE_VF_STATISTICS from initialization and make it part of pipeline state and then just disabale statistics before blits and memcpy operations. Emitting 3DSTATE_VF_STATISTICS should be fairly cheap so it doesn't really matter which path we take. We choose the second option as it's more consistent with the way the rest of the statistics are enabled and disabled. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Jason Ekstrand	9576cea519	anv/pipeline: Enable clipper statistics Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Jason Ekstrand	2a616242cd	genxml: s/Clipper Statistics Enable/Statistics Enable/ It's in 3DSTATE_CLIP, so it doesn't really need the extra detail. This matches what we do for VS, FS, etc. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Jason Ekstrand	149d10d38a	anv/query: Rework store_query_result The new version is a nice GPU parallel to cpu_write_query_result and it nicely handles things like dealing with 32 vs. 64-bit offsets in the destination buffer. Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Jason Ekstrand	c773ae88df	anv/query: Break GPU query calculation into a helper Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Jason Ekstrand	7de73f0c94	genxml: Add pipeline statistics registers on gen7+ Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Jason Ekstrand	0557dfdb4a	anv/query: Add a helper for writing a query pool result Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:50 -07:00
Jason Ekstrand	bce4a935c6	anv/query: Use a variable-length slot size Not all queries are the same. Even the two queries we support today require a different amount of data per slot. Once we introduce pipeline statistics queries, the size will vary wildly. Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:49 -07:00
Jason Ekstrand	1c797af2c6	anv/query: Move the available bits to the front We're about to make slots variable-length and always having the available bits at the front makes certain operations substantially easier once we do that. Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:12:47 -07:00
Jason Ekstrand	9d43afa3dc	anv/query: Let 32-bit values wrap From the Vulkan 1.0.39 Specification: "If VK_QUERY_RESULT_64_BIT is not set and the result overflows a 32-bit value, the value may either wrap or saturate." So we can either clamp or wrap. Wrapping is both easier and what the user gets if they use vkCmdCopyQueryPoolResults and we should be consistent. We could make vkCmdCopyQueryPoolResults clamp but it's annoying and ends up burning extra batch for something the spec clearly doesn't require. Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 12:11:35 -07:00
Robert Bragg	a27b62e794	anv/device: init timestampPeriod from devinfo Now that there's a timebase_scale in gen_device_info which is effectively the 'period' this switches anv_GetPhysicalDeviceProperties to using this common device info to initialize the timestampPeriod device limit. Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-17 16:10:22 +00:00
Robert Bragg	344d1a4015	i965: Allow a per gen timebase scale factor Prior to Skylake the Gen HW timestamps were driven by a 12.5MHz clock with the convenient property of being able to scale by an integer (80) to nanosecond units. For Skylake the frequency is 12MHz or a scale factor of 83.333333 This updates gen_device_info to track a floating point timebase_scale factor and makes corresponding _queryobj.c changes to no longer assume a scale factor of 80 works across all gens. Although the gen6_ code could have been been left alone, the changes keep the code more comparable, and it now shares a few utility functions for scaling raw timestamps and calculating deltas. The utility for calculating deltas takes into account 32 or 36bit overflow depending on the current kernel version. Note: this leaves the timestamp handling of ARB_query_buffer_object untouched, which continues to use an incorrect scale of 80 on Skylake for now. This is more awkward to solve since the scaling is currently done using a very limited uint64 ALU available to the command parser that doesn't support multiply or divide where it's already taking a large number of instructions just to effectively multiple by 80. This fixes piglit arb_timer_query-timestamp-get on Skylake v2: (Ken) Update timebase_scale for platforms past Skylake/Broxton too. Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-03-17 15:45:19 +00:00
Jason Ekstrand	28b134c75c	anv/device: Remove a use of a compound literal Older versions of GCC don't like compound literals in static const variable declarations because they don't think it's an actual constant value. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-17 08:40:30 -07:00
Jason Ekstrand	08df015b9d	anv/GetQueryPoolResults: Actually implement the spec The Vulkan spec is fairly clear about when we should and should not write query pool results. We're also supposed to return VK_NOT_READY if VK_QUERY_RESULT_PARTIAL_BIT is not set and we come across any queries which are not yet finished. This fixes rendering corruptions on The Talos Principle where geometry flickers in and out due to bogus query results being returned by the driver. These issues are most noticable on Sky Lake GT4 2hen running on "ultra" settings. Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100182 Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>	2017-03-16 15:08:18 -07:00
Jason Ekstrand	81840130c0	anv/query: Invalidate the correct range Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>	2017-03-16 15:08:17 -07:00
Jason Ekstrand	4bbb4b95b8	anv/query: Fix the location of timestamp availability Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "17.0 13.0" <mesa-dev@lists.freedesktop.org>	2017-03-16 15:08:17 -07:00
Jason Ekstrand	9e60f59e62	genxml: Add XML version tags There's not much point to having them or not having them but this reduces some pointless diff from the version we can auto-generate Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-16 15:08:17 -07:00
Kenneth Graunke	f51a320b12	aubinator: Use fprintf for output. This will make it easier to choose an output file. For now, it remains stdout. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-16 10:48:44 -07:00
Kenneth Graunke	65a9d5eabb	aubinator: Reuse decode_structure code for handling commands The code for decoding structures and commands was almost identical. The only differences are: we print dword headers for commands, and we skip the first one (with the command opcode and lengths). So, generalize decode_structure to add a starting DWord, and a flag for printing the DWord headers, and reuse it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-16 10:48:41 -07:00
Kenneth Graunke	f0aa8fd4e4	aubinator: Delete redundant NULL check. handle_struct_decode() is just a wrapper around decode_structure() with a NULL check. But the only caller already does that NULL check. So, just use decode_structure() directly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-16 10:48:37 -07:00
Kenneth Graunke	65138ce019	aubinator: Fix indentation. Three space, not four. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-03-16 10:48:32 -07:00
Iago Toral Quiroga	ca34a3125f	anv: improve error reporting when creating pipelines Specifically, report 'out of memory' errors that might have happened while emitting the pipeline's batch. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	1d7468311d	anv: handle errors in emit_binding_table() and emit_samplers() These can fail to allocate device memory, however, the driver can recover from this error by allocating a new binding table block and trying again. v2: - Instead of tracking the errors in these functions and making callers reset the batch's status before attempting to allocate a new block for the binding table, simply make callers responsible for setting the error status if they fail to allocate memory during the second attempt (Jason). Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	dd8348c8be	anv: handle errors while allocating new binding table blocks Also, we had a couple of instances in flush_descriptor_sets() were we were returning a VkResult directly upon error, but the return value of this function is not a VkResult but a uint32_t dirty mask, so simply return 0 in these cases which reduces the amount of work the driver will do after the error has been raised. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	be52f9693a	anv/blorp: make anv_cmd_buffer_alloc_blorp_binding_table() return a VkResult Instead of asserting inside the function, and then use use that information to return early from its callers upon failure. v2: - Make sure that clear_color_attachment() and clear_depth_stencil_attachment() get the VkResult as well so they avoid executing the batch if an error happened. (Topi) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	a578b06d7b	anv/device: assert that commands submitted to a queue are not bogus Any errors that may have happened during the command buffer recording are reported by vkEndCommandBuffer() and it is the application's reponsibility to not submit broken commands to a queue. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	a752c4ecda	anv/cmd_buffer: skip vkCmdExecuteCommands() on broken command buffers v2: Assert on secondary commands, applications should've called vkEndCommandBuffer() and received an error for them before (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	801493051e	anv/cmd_buffer: skip vkCmdDispatch() on broken command buffers Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	18ec3fa2a9	anv/cmd_buffer: skip vkCmdDraw*() on broken command buffers Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	fb9d563fb9	anv: handle memory allocation errors during queue submissions Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	c04dbd6b3e	anv/cmd_buffer: handle out of memory during vkCmdPushConstants Fixes: dEQP-VK.api.out_of_host_memory.cmd_push_constants Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	94a4f0c255	anv/cmd_buffer: handle allocation errors during vkCmdBeginRenderPass() Fixes: dEQP-VK.api.out_of_host_memory.cmd_begin_render_pass Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	d823f381a5	anv/cmd_buffer: skip vkCmdEndRenderPass() for broken command buffers Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	6743456699	anv/cmd_buffer: skip vkCmdNextSubpass() for broken command buffers Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	8174e63869	anv/cmd_buffer: report tracked errors in vkEndCommandBuffer() Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	68d88f0237	anv: handle failures when growing reloc lists Growing the reloc list happens through calling anv_reloc_list_add() or anv_reloc_list_append(). Make sure that we call these through helpers that check the result and set the batch error status if needed. v2: - Handling the crashes is not good enough, we need to keep track of the error, for that, keep track of the errors in the batch instead (Jason). - Make reloc list growth go through helpers so we can have a central place where we can do error tracking (Jason). v3: - Callers that need the offset returned by anv_reloc_list_add() can compute it themselves since it is extracted from the inputs to the function, so change the function to return a VkResult, make anv_batch_emit_reloc() also return a VkResult and let their callers do the error management (Topi) v4: - Let anv_batch_emit_reloc() return an uint64_t as it originally did, there is no real benefit in having it return a VkResult. - Do not add an is_aux parameter to add_surface_state_reloc(), instead do error checking for aux in add_image_view_relocs() separately. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	d4bdd871dc	anv: avoid crashes when failing to allocate batches Most of the time we use macros that handle this situation transparently, but there are some cases were we need to handle this explicitly. This patch makes sure we don't crash, notice that error handling takes place in the function that actually failed the allocation, anv_batch_emit_dwords(), which will set the status field of the batch so it can be used at a later moment to report the error to the user. v2: - Not crashing is not good enough, we need to keep track of the error (Topi, Jason). Iago: now that we track errors in the batch, this is being handled. - Added guards in a few more places that needed it (Iago) v3: - Check result of anv_batch_emitn() for NULL before calling memset() in emit_vertex_input() (Topi) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	31f5049ff1	anv: handle allocation failure in anv_batch_emit_dwords() Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	9e69409fcf	anv: handle allocation failure in anv_batch_emit_batch() v2: - Call the error handler (Topi) Fixes: dEQP-VK.api.out_of_host_memory.cmd_execute_commands Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	a8ce8e3542	anv: add anv_batch_set_error() and anv_batch_has_error() helpers The anv_batch_set_error() helper will track the first error that happened while recording a command buffer. The helper returns the currently tracked error to help the job of internal functions that may generate errors that need to be tracked and return a VkResult to the caller. We will use the anv_batch_has_error() helper to guard parts of the driver that are not safe to execute if an error has been generated while recording a particular command buffer. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	d0195bd067	anv/cmd_buffer: add a status field to anv_batch The vkCmd*() functions do not report errors, instead, any errors should be reported by the time we call vkEndCommandBuffer(). This means that we need to make the driver robust against incosistent and/or imcomplete command buffer states through the command recording process, particularly, avoid crashes due to access to memory that we failed to allocate previously. The strategy used to do this is to track the first error ocurred while recording a command buffer in the batch associated with it. We use the batch to track this information because the command buffer may not be visible to all parts of the driver that can produce errors we need to be aware of (such as allocation failures during batch emissions). Later patches will use this error information to guard parts of the driver that may not be safe to execute. v2: Move the field from the command buffer to the batch so we can track errors from batch emissions (Jason) v3: Registering errors in the command buffer's batch during anv_create_cmd_buffer() is unnecessary, since the command buffer is freed at the end of the function in that case (Topi) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	6dd06f54eb	anv/cmd_buffer: report errors in vkBeginCommandBuffer() Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	88b539c4a0	anv: do not try to ref/unref NULL shaders This situation can happen if we failed to allocate memory for the shader. v2: - We shouldn't see NULL shaders in anv_shader_bin_ref so we should not check for that (Jason). Make sure that callers don't attempt to call this function with a NULL shader and assert that this never happens (Iago). v3: - All callers to anv_shader_bin_unref seem to check for NULL before calling, so just assert that it is not NULL (Topi) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	bad3a2e911	anv/blorp: return early if we failed to create the shader binary Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	e2f707ce5b	intel/blorp: make upload_shader() return a bool indicating success or failure For now we always return true, follow-up patches will handle fail scenarios. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Iago Toral Quiroga	808503b8f8	anv: remove unnecessary function prototype. The function is defined right after the prototype declaration. Also, the protoype for it is included in anv_genX.h which is included via anv_private.h. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-16 11:40:05 +01:00
Emil Velikov	b1fb6e8d8c	anv: do not open random render node(s) drmGetDevices2() provides us with enough flexibility to build heuristics upon. Opening a random node on the other hand will wake up the device, regardless if it's the one we're interested or not. v2: Rebase, explicitly require/check for libdrm v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia) v4: Rebase Cc: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1) Tested-by: Mike Lothian <mike@fireburn.co.uk>	2017-03-15 11:38:05 +00:00
Emil Velikov	a9a4028fd7	util/sha1: rework _mesa_sha1_{init,final} Rather than having an extra memory allocation [that we currently do not and act accordingly] just make the API take an pointer to a stack allocated instance. This and follow-up steps will effectively make the _mesa_sha1_foo simple define/inlines around their SHA1 counterparts. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>	2017-03-15 11:18:43 +00:00
Jason Ekstrand	d142c7436c	intel/debug: Add a common INTEL_DEBUG=nohiz option The GL driver had a driconf option (which doesn't make much sense) and the Vulkan driver had a hand-rolled environment variable. Instead, let's tie both into the INTEL_DEBUG mechanism and unify things. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 21:00:09 -07:00
Jason Ekstrand	c09bb956ca	anv/image: Move handling of INTEL_VK_HIZ This makes it so that you don't get an "Implement gen7 HiZ" perf warning when you manually disable HiZ on gen8. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 21:00:09 -07:00
Jason Ekstrand	aed2714145	anv: Properly enumerate physical devices when none are present	2017-03-14 09:08:07 -07:00
Jason Ekstrand	762a6333f2	nir: Rework conversion opcodes The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	7107b32155	i965/fs: Re-arrange conversion operations Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	bab4610e9c	i965/vec4: Get rid of the type parameter from to/from_double Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	b377be9213	i965/fs: Use num_components from the SSA def in image intrinsics Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	678fd00f2f	anv/blorp: Only set a clear color for resolves if fast-cleared Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "17.0" <mesa-stable@lists.freedesktop.org>	2017-03-14 07:36:20 -07:00
Jason Ekstrand	273b720310	anv/blorp: Turn off AUX after doing a CCS_D resolve For render passes with multiple subpasses on gen7, we only fast-clear at the top but an input attachment use can cause us to do a resolve in the middle of the render pass. Once we've done so, we are no longer have a fast-cleared surface so we can just set aux_usage to NONE. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Cc: "17.0" <mesa-stable@lists.freedesktop.org>	2017-03-14 07:36:20 -07:00
Tapani Pälli	773d510c66	android: add '/vulkan' to libmesa_anv_entrypoints path otherwise generated entrypoint headers are not found during build Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-14 07:48:30 +02:00
Tapani Pälli	4734322574	android: add src/intel/compiler to libmesa_intel_compiler includes fixes build error when brw_nir.h not found in the generated file brw_nir_trig_workarounds.c. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-14 07:48:22 +02:00
Gwan-gyeong Mun	8f22552a4f	anv: Add missing error-checking to anv_CreateDevice (v3) This patch adds missing error-checking and fixes resource leak in allocation failure path on anv_CreateDevice() v2: Fixes from Jason Ekstrand's review a) Add missing destructors for all of the state pools on allocation failure path b) Add missing destructor for batch bo pools on allocation failure path v3: Fixes from Emil Velikov's review Add missing destructor for queue and scratch_pool on allocation failure path Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 21:29:43 -07:00
Chad Versace	c5a0829e1f	anv: Use vk_outarray in vkGetPhysicalDeviceQueueFamilyProperties No intended change in behavior. Just a refactor. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 15:08:15 -07:00
Chad Versace	876f0ecd2f	anv: Use vk_outarray in vkEnumeratePhysicalDevices (v2) No intended change in behavior. Just a refactor. v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 15:08:15 -07:00
Lionel Landwerlin	bf47e5ba53	intel: genxml: prevent missing ; with address fields dwords Before this change, the generator could print this kind of things : const uint32_t v0 = __gen_uint(values->ValidBit, 0, 0) \| __gen_uint(values->FaultType, 1, 2) \| __gen_uint(values->SRCIDofFault, 3, 10) \| __gen_uint(values->GTTSEL, 11, 1) \| dw[0] = __gen_combine_address(data, &dw[0], values->VirtualAddressofFault, v0); This change fix the trailing '\|'. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 17:23:49 +00:00

... 7 8 9 10 11 ...

2206 Commits