KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	6279074de1	nir: Get rid of global registers We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Karol Herbst	5450f1c9fb	v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-07 15:13:36 +02:00
Eric Anholt	276d22c52d	v3d: Add some more new packets for V3D 4.x. The T/G shader references and common state will be needed for GLES 3.2.	2019-04-04 17:30:35 -07:00
Eric Anholt	62360e92ec	v3d: Bump the maximum texture size to 4k for V3D 4.x. 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: mesa-stable@lists.freedesktop.org	2019-04-04 17:30:35 -07:00
Eric Anholt	bfed0a7099	v3d: Remove some dead members of struct v3d_compile. These are more vc4 leftovers.	2019-03-21 14:20:50 -07:00
Eric Anholt	16f2770eb4	v3d: Upload all of UBO[0] if any indirect load occurs. The idea was that we could skip uploading the constant-indexed uniform data and just upload the uniforms that are variably-indexed. However, since the VS bin and render shaders may have a different set of uniforms used, this meant that we had to upload the UBO for each of them. The first case is generally a fairly small impact (usually the uniform array is the most space, other than a couple of FSes in shader-db), while the second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms instead of 18k. Given that the optimization is of dubious value, has a big downside, and is quite a bit of code, just drop it. No change in shader-db. No change on 3DMMES2 (n=15).	2019-03-21 14:20:50 -07:00
Eric Anholt	320e96bace	v3d: Move constant offsets to UBO addresses into the main uniform stream. We'd end up with the constant offset in the uniform stream anyway, since they're bigger than small immediates. Avoids the extra uniforms and adds in the shader in favor of just adding once on the CPU. shader-db: total instructions in shared programs: 6496865 -> 6494851 (-0.03%) total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)	2019-03-21 14:20:50 -07:00
Eric Anholt	c36d2793ec	v3d: Rename v3d_tmu_config_data to v3d_unit_data. I want to reuse this for encoding small constant UBO/SSBO offsets into the uniform stream to reduce the extra uniform loads and adds for the small constant offsets.	2019-03-21 14:20:50 -07:00
Eric Anholt	0c874c18cd	v3d: Fix leak of the mem_ctx after the DAG refactor. Noticed while trying to get a CTS run again. Fixes: `33886474d6` ("v3d: Use the DAG datastructure for QPU instruction scheduling.")	2019-03-12 16:15:40 -07:00
Eric Anholt	33886474d6	v3d: Use the DAG datastructure for QPU instruction scheduling. Just a small code reduction from shared infrastructure.	2019-03-11 13:14:32 -07:00
Eric Anholt	7c01ddbf7f	v3d: Reuse list_for_each_entry_rev().	2019-03-11 13:14:32 -07:00
Eric Anholt	c4d2da1f14	v3d: Include a count of register pressure in the RA failure dumps. You usually want to go find the highest pressure and figure out why you couldn't spill or what pattern led to a bunch of pressure leading to that point.	2019-03-06 14:13:45 -08:00
Eric Anholt	5c655c47db	v3d: Drop the V3D 3.x vpm read dead code elimination. We now have NIR dead code eliminating our VPM reads, so this shouldn't be necessary.	2019-03-05 12:57:39 -08:00
Eric Anholt	e8ee1f8eaf	v3d: Eliminate the TLB and TLBU files. We can just use the magic register file like we do for other magic waddrs.	2019-03-05 12:57:39 -08:00
Eric Anholt	110f14d4b4	v3d: Use ldunif instructions for uniforms. The idea is that for repeated use of the same uniform, we could avoid loading it on each consumer. The results look pretty good. total instructions in shared programs: 6413571 -> 6521464 (1.68%) total threads in shared programs: 154214 -> 154000 (-0.14%) total uniforms in shared programs: 2393604 -> 2119629 (-11.45%) total spills in shared programs: 4960 -> 4984 (0.48%) total fills in shared programs: 6350 -> 6418 (1.07%) Once we do scheduling at the NIR level, the register pressure (and thus also instructions) issues we see here will drop back down.	2019-03-05 12:57:39 -08:00
Eric Anholt	4036fce8fd	v3d: Add support for register-allocating a ldunif to a QFILE_TEMP. On V3D 4.x, we can use ldunifrf to load uniforms to any register, and this will let us schedule the ldunif wherever we want in the program.	2019-03-05 12:57:39 -08:00
Eric Anholt	70df388219	v3d: Drop the old class bits splitting up the accumulators. This seems to be left over from vc4, and I don't use them any more.	2019-03-05 12:57:39 -08:00
Eric Anholt	dff1fc04e0	v3d: Add support for vir-to-qpu of ldunif instructions to a temp. We can load a uniform to any register, so add support for non-ALU instructions with sig.ldunif to a temp.	2019-03-05 12:57:39 -08:00
Eric Anholt	4739181a16	v3d: Switch implicit uniforms over to being any qinst->uniform != ~0. I'm not sure why I didn't do this before -- it's clearly much simpler to add dumping of the extra thing than to have it as another implicit source.	2019-03-05 12:57:39 -08:00
Eric Anholt	1e98f02d88	v3d: Do uniform rematerialization spilling before dropping threadcount This feels like the right tradeoff for threads vs uniforms, particularly given that we often have very short thread segments right now: total instructions in shared programs: 6411504 -> 6413571 (0.03%) total threads in shared programs: 153946 -> 154214 (0.17%) total uniforms in shared programs: 2387665 -> 2393604 (0.25%)	2019-03-05 12:57:39 -08:00
Eric Anholt	060979a380	v3d: Fix temporary leaks of temp_registers and when spilling. On each iteration of successfully spilling a reg, we'd allocate another copy of temp_registers, and when decrementing thread conut we'd allocate another copy of the graph. These all got cleaned up on freeing the compile.	2019-03-05 12:57:39 -08:00
Eric Anholt	2780a99ff8	v3d: Move the stores for fixed function VS output reads into NIR. This lets us emit the VPM_WRITEs directly from nir_intrinsic_store_output() (useful once NIR scheduling is in place so that we can reduce register pressure), and lets future NIR scheduling schedule the math to generate them. Even in the meantime, it looks like this lets NIR DCE some more code and make better decisions. total instructions in shared programs: 6429246 -> 6412976 (-0.25%) total threads in shared programs: 153924 -> 153934 (<.01%) total loops in shared programs: 486 -> 483 (-0.62%) total uniforms in shared programs: 2385436 -> 2388195 (0.12%) Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)	2019-03-05 10:59:40 -08:00
Eric Anholt	a9dd227a47	v3d: Translate f2i(fround_even) as FTOIN. This appears to be just what the opcode does. Needed for equivalence when moving FF VPM stores into NIR.	2019-03-05 10:59:40 -08:00
Eric Anholt	fd1d22b92e	v3d: Stop treating exec masking specially. In our backend, the successor edges from the blocks only point to where QPU control flow goes, not where the notional control flow goes from a "break" or "continue" modifying the execution mask to resume writing to some channels later. As a result, this attempt at restricting live ranges ended up missing the live range of a value where a conditional break/continue was present in a loop before the later def of a variable. The previous commit ended up fixing the problem that the flag tried to solve. Fixes glsl-vs-loop-continue.shader_test and/or glsl-vs-loop-redundant-condition.shader_test based on register allocation results.	2019-03-05 07:36:24 -08:00
Eric Anholt	c6ae666cf5	v3d: Restrict live intervals to the blocks reachable from any def. In the backend, we often have condition codes on writes to variables, such that there's no screening def anywhere and the previous live ranges algorithm would conclude that the start of the range extends to the start of the program. However, we do know that the live range can only extend as early as you can reach from all blocks writing to the variable. The motivation was that, while we have a couple of hacks to try to promote conditional writes up to being a def within the block, the exec_mask one was broken and needed a replacement. Based on `c3c1aa5aeb` ("intel/fs: Restrict live intervals to the subset possibly reachable from any definition.").	2019-03-05 07:36:24 -08:00
Eric Anholt	97566efe5c	v3d: Rematerialize MOVs of uniforms instead of spilling them. If we have a MOV of a uniform value available to spill, that's one of our best choices. We can just not spill the value, and emit a new load of the uniform as the fill. This saves bothering the TMU and the thrsw, and is the same cost in uniforms (since the spill offset is a uniform anyway). This doesn't have a huge impact on shader-db, since there aren't a whole lot of spills and we usually copy-prop the uniforms at the VIR level such that the only uniform MOVs are from vir_lower_uniforms: total instructions in shared programs: 6430292 -> 6430279 (<.01%) total uniforms in shared programs: 2386023 -> 2385787 (<.01%) total spills in shared programs: 4961 -> 4960 (-0.02%) total fills in shared programs: 6352 -> 6350 (-0.03%) However, I'm interested in dropping the uniforms copy-prop in the backend, since it would be cheaper to not load repeated uniforms if we have the registers to spare. This also saves many spills on dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20, which is what motivated a bunch of my recent backend work in the first place: before: 46 spills, 106 fills, 3062 instructions after: 0 spills, 0 fills, 2611 instructions	2019-02-25 21:33:47 -08:00
Eric Anholt	e0fada983d	v3d: Dump the VIR after register spilling if we were forced to. Spilling is unusual, but one often has to debug it when it happens, so dump it.	2019-02-25 21:26:24 -08:00
Eric Anholt	2786d2161a	v3d: Fix vir_is_raw_mov() for input unpacks. There are no users at the moment, but I wanted to start using this in register spilling.	2019-02-25 21:26:24 -08:00
Eric Anholt	dbe3af67a4	v3d: Move i2b and f2b support into emit_comparison. This lets us save a resolve to NIR true/false for ifs and discard_if. No change in shader-db.	2019-02-18 18:18:37 -08:00
Eric Anholt	0bba9c8489	v3d: Emit a simpler negate for the iabs implementation. One program affected in my shader-db. instructions in affected programs: 110 -> 108 (-1.82%)	2019-02-18 18:13:09 -08:00
Eric Anholt	1a775d43c9	v3d: Delay emitting ldvpm on V3D 4.x until it's actually used. For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need to do VPM setup when the load_inputs are out of order. For V3D 4.x, we can reduce register pressure by delaying our loads until they're actually needed. This also avoids a bunch of silly MOVs in the pre-opt VIR dump. total instructions in shared programs: 6421415 -> 6419933 (-0.02%) total uniforms in shared programs: 2393139 -> 2393140 (<.01%) total threads in shared programs: 153864 -> 153906 (0.03%)	2019-02-18 18:09:07 -08:00
Eric Anholt	5a84d46896	v3d: Stop tracking num_inputs for VPM loads. It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.	2019-02-18 18:09:07 -08:00
Eric Anholt	581eba072d	v3d: Add a function to describe what the c->execute.file check means. This is what pointed out that we were misusing the check for last_thrsw in the previous commit.	2019-02-18 18:09:07 -08:00
Eric Anholt	441294962c	v3d: Fix the check for "is the last thrsw inside control flow" The execute.file check used to be good enough, until I stopped setting up the execute mask for uniform ifs. No known tests fixed, noticed while doing a refactor. Fixes: `0805060573` ("v3d: Handle dynamically uniform IF statements with uniform control flow.")	2019-02-18 18:09:07 -08:00
Eric Anholt	07d5b5a972	v3d: Fix f2b32 behavior. Now that we don't have the vir_PF() magic, it's obvious that we were doing the wrong thing for f2b32 by allowing -0.0 to produce true instead of false.	2019-02-18 18:09:07 -08:00
Eric Anholt	3022b4bd82	v3d: Kill off vir_PF(), which is hard to use right. You were allowed to pass in any old temp so that you could hopefully fold the PF up into the def of the temp. If we couldn't find one, it implicitly generated a MOV(nop, reg). However, that PF could have different behavior depending on whether the def being folded into was a float or int opcode, which the caller doesn't necessarily control. Due to the fragility of the function, just switch all callers over to vir_set_pf(). This also encourages the callers to use a _dest call for the inst they're putting the PF on, eliminating a bunch of temps in the pre-optimization VIR. shader-db says the change is in the noise: total instructions in shared programs: 6226247 -> 6227184 (0.02%) instructions in affected programs: 851068 -> 852005 (0.11%)	2019-02-18 18:09:06 -08:00
Eric Anholt	6186a8d44e	v3d: Do bool-to-cond for discard_if as well. Turns this minimal conditional discard (glsl-fs-discard-01.shader_test): 0x3de0b086c5fe9000 fcmp.pushn -, r1, r5; mov r2, 0 0x3dec3086bbfc001f nop ; mov.ifa r2, -1 0x3c047186bbe80000 nop ; mov.pushz -, r2 0x3dea3186ba837000 setmsf.ifna -, 0 ; nop into: 0x3c00b186c582a000 fcmp.pushn -, r2, r5; nop 0x3de83186ba837000 setmsf.ifa -, 0 ; nop total instructions in shared programs: 6229820 -> 6226247 (-0.06%)	2019-02-18 18:09:06 -08:00
Eric Anholt	718eef62cb	v3d: Refactor bcsel and if condition handling. Both were doing the same thing to try to get a condition to predicate on. Noticed when I wanted to do this for discard_if as well. No change in shader-db.	2019-02-18 18:09:06 -08:00
Eric Anholt	4586f9f902	v3d: Add a helper function for getting a nop register. Just a little refactor to explain what's going on with QFILE_NULL.	2019-02-18 18:09:06 -08:00
Eric Anholt	339155122b	v3d: Drop our hand-lowered nir_op_ffract. The NIR lowering works fine, though it causes some slight noise due to what looks like choices about propagating constants up multiply chains changing. total instructions in shared programs: 6229671 -> 6229820 (<.01%) total uniforms in shared programs: 2312171 -> 2312324 (<.01%)	2019-02-18 18:09:06 -08:00
Eric Anholt	16f5085490	v3d: Drop a perf note about merging unpack_half_*, which has been implemented. This is handled with copy-propagation now.	2019-02-18 18:09:06 -08:00
Eric Anholt	146e432b49	v3d: Fix incorrect flagging of ldtmu as writing r4 on v3d 4.x. Fixes some stalls in 3DMMES's main vertex shader. total instructions in shared programs: 6280751 -> 6211270 (-1.11%) instructions in affected programs: 2935050 -> 2865569 (-2.37%)	2019-02-18 18:09:06 -08:00
Eric Anholt	cd5e0b2729	v3d: Use the early_fragment_tests flag for the shader's disable-EZ field. Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: `051a41d3d5` ("v3d: Add support for the early_fragment_tests flag.")	2019-02-18 18:09:06 -08:00
Eric Anholt	3f22b35a43	v3d: Use the NIR lowering for isign instead of rolling our own. min/max instead of comparisons saves 2 instructions on fs-sign-int.shader_test.	2019-02-14 00:32:30 +00:00
Eric Anholt	3c08ecf147	v3d: Whitespace consistency fix.	2019-02-05 15:46:42 -08:00
Eric Anholt	940501a446	v3d: Fix copy-propagation of input unpacks. I had a single function for "does this do float input unpacking" with two major flaws: It was missing the most common thing to try to copy propagate a f32 input nunpack to (the VFPACK to an FP16 render target) along with several other ALU ops, and also would try to propagate an f32 unpack into a VFMUL which only does f16 unpacks. instructions in affected programs: 659232 -> 655895 (-0.51%) uniforms in affected programs: 132613 -> 135336 (2.05%) and a couple of programs increase their thread counts. The uniforms hit appears to be a pattern in generated code of doing (-a >= a) comparisons, which when a is abs(b) can result in the abs instruction being copy propagated once but not fully DCEed.	2019-02-05 15:46:04 -08:00
Eric Anholt	e5c6938590	v3d: Fix input packing of .l for rounding/fdx/fdy. Avoids a regression in dEQP-GLES3.functional.shaders.derivate.fwidth.texture.* once we start copy-propagating more input packs.	2019-02-05 15:45:23 -08:00
Eric Anholt	1a4170952d	v3d: Fix pack/unpack of VFPACK operand unpacks. We want to be able to copy propagate our texture unpacks into the vfpack.	2019-02-05 15:45:23 -08:00
Eric Anholt	d0fdbd4211	v3d: Fix dumping of shaders with alpha test. We were trying to print a NULL entry from the table.	2019-02-05 15:42:14 -08:00
Eric Anholt	bdef17b052	v3d: Store the actual mask of color buffers present in the key. If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).	2019-02-05 15:42:04 -08:00
Eric Anholt	ab4d5775b0	v3d: Fix image_load_store clamping of signed integer stores. This was copy-and-paste fail, that oddly showed up in the CTS's reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i to rgba8i or reinterprets to other signed int formats. Fixes: `6281f26f06` ("v3d: Add support for shader_image_load_store.")	2019-01-31 08:39:40 -08:00
Eric Anholt	6053c7bb43	v3d: Fix a release build set-but-unused compiler warning.	2019-01-29 16:02:51 -08:00
Emil Velikov	385843ac3c	vc4: Declare the last cpu pointer as being modified in NEON asm. Earlier commit addressed 7 of the 8 instances available. v2: Rebase patch back to master (by anholt) Cc: Carsten Haitzler (Rasterman) <raster@rasterman.com> Cc: Eric Anholt <eric@anholt.net> Fixes: `300d3ae8b1` ("vc4: Declare the cpu pointers as being modified in NEON asm.") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-29 16:00:25 -08:00
Dylan Baker	90a7a9c973	automake: Add include dir for nir src directory Fixes: `6281f26f06` ("v3d: Add support for shader_image_load_store.") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-01-29 23:24:57 +00:00
Eric Anholt	f7769b5121	v3d: Fix the autotools build. Noticed while looking at the gitlab-CI MR.	2019-01-29 14:00:27 -08:00
Carsten Haitzler (Rasterman)	300d3ae8b1	vc4: Declare the cpu pointers as being modified in NEON asm. Otherwise, the compiler is free to reuse the register containing the input for another call and assume that the value hasn't been modified. Fixes crashes on texture upload/download with current gcc. We now have to have a temporary for the cpu2 value, since outputs must be lvalues. (commit message by anholt) Fixes: `4d30024238` ("vc4: Use NEON to speed up utile loads on Pi2.")	2019-01-28 16:45:45 -08:00
Carsten Haitzler (Rasterman)	522f688471	vc4: Use named parameters for the NEON inline asm. This makes the asm code more intelligible and clarifies the functional change in the next commit. (commit message and commit squashing by anholt)	2019-01-28 16:40:46 -08:00
Eric Anholt	c496b60ed8	v3d: Create separate sampler states for the various blend formats. The sampler border color is encoded in the TMU's blending format (half floats, 32-bit floats, or integers) and must be clamped to the format's range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't know about how we're abusing the swizzle to support BGRA, A, and LA, so we have to pre-swizzle the border color for those. We don't really want to spend half a kb on sampler states in most cases, so skip generating the variants when the border color is unused or is 0,0,0,0.	2019-01-27 08:30:03 -08:00
Eric Anholt	09472006ff	v3d: Use the symbolic names for wrap modes from the XML.	2019-01-27 08:30:03 -08:00
Eric Anholt	060575bea8	v3d: Drop maximum number of texture units down to 16. This is the GLES 3.2 minmax, and also what the closed source driver does. Avoids hitting OOMs in the CTS's dEQP-GLES3.functional.texture.units.all_units.only_cube.1.	2019-01-27 08:30:03 -08:00
Eric Anholt	3e743d8cd8	v3d: Avoid duplicating limits defines between gallium and v3d core. We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.	2019-01-27 08:30:03 -08:00
Eric Anholt	fe6a21c867	v3d: Fix overly-large vattr_sizes structs. We want one vector size per vector, not per component.	2019-01-27 08:30:03 -08:00
Eric Anholt	f72820c851	v3d: Add support for CS barrier() intrinsics.	2019-01-14 15:40:55 -08:00
Eric Anholt	9b45b06d7c	v3d: Add support for CS shared variable load/store/atomics. CS shared variables are handled effectively as SSBO access to a temporary buffer that will be allocated at CS dispatch time.	2019-01-14 15:40:55 -08:00
Eric Anholt	01d913cf90	v3d: Add support for CS workgroup/invocation id intrinsics. We get a payload for the ivec3 workgroup and an int local invocation index, and we use the core lowering to turn into the global invocation id and the local invocation id ivec3s.	2019-01-14 15:40:55 -08:00
Eric Anholt	6281f26f06	v3d: Add support for shader_image_load_store. This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).	2019-01-14 15:40:55 -08:00
Eric Anholt	5932c2f0b9	v3d: Add SSBO/atomic counters support. So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.	2019-01-14 15:40:55 -08:00
Eric Anholt	1a63227ea0	v3d: Add support for matrix inputs to the FS. We've been relying on linking splitting up our varying matrices into separate vectors, but with SSO that doesn't happen. Supporting matrix inputs isn't too hard, though.	2019-01-14 13:18:02 -08:00
Eric Anholt	3790ee07e6	v3d: Fix txf_ms 2D_ARRAY array index. We need to pass the array index through our coordinate transform unchanged. Fixes dEQP-GLES31.functional.texture.multisample.samples_1.*_2d_array	2019-01-14 13:18:02 -08:00
Eric Anholt	051a41d3d5	v3d: Add support for the early_fragment_tests flag. If this flag hasn't been set by the shader and it has some visible side effects, then we need to disable EZ.	2019-01-14 13:18:02 -08:00
Eric Anholt	b417a9f7b2	v3d: Add support for flushing dirty TMU data at job end. This will be needed for SSBOs and image_load_store.	2019-01-14 13:18:02 -08:00
Eric Anholt	6051c11d17	nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results. V3D returns the texels in a different order in the resulting vec4 from what GLSL wants, so we need to put in a swizzle. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 13:03:41 -08:00
Eric Anholt	8847370424	v3d: Use the core tex lowering. Even without any clever optimization on the unpack operations, this gives us a useful value for the channels read field, which we can use to avoid ldtmu instructions to the no-op register. instructions in affected programs: 890712 -> 881974 (-0.98%)	2019-01-04 15:59:59 -08:00
Eric Anholt	81b9361b68	v3d: Stop scalarizing our uniform loads. We can pull a whole vector in a single indirect load. This saves a bunch of round-trips to the TMU, instructions for setting up multiple loads, references to the UBO base in the uniforms, and apparently manages to reduce register pressure as well. instructions in affected programs: 3086665 -> 2454967 (-20.47%) uniforms in affected programs: 919581 -> 721039 (-21.59%) threads in affected programs: 1710 -> 3420 (100.00%) spills in affected programs: 596 -> 522 (-12.42%) fills in affected programs: 680 -> 562 (-17.35%) Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)	2019-01-04 15:41:23 -08:00
Eric Anholt	f8a8de8b9a	v3d: Do UBO loads a vector at a time. In the process of adding support for SSBOs and CS shared vars, I ended up needing a helper function for doing TMU general ops. This helper can be that starting point, and saves us a bunch of round-trips to the TMU by loading a vector at a time.	2019-01-04 15:41:23 -08:00
Eric Anholt	b0e0086257	v3d: Remove dead switch cases and comments from v3d_nir_lower_io. Moving things to NIR left this mess around. All we lower now is uniforms.	2019-01-04 15:41:23 -08:00
Eric Anholt	e1385e879d	v3d: Reinstate the new shader-db output after v3d_compile() refactor. I misplaced it in the rebase conflicts.	2019-01-04 15:26:19 -08:00
Eric Anholt	d2b899c0ec	v3d: Refactor compiler entrypoints. Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.	2019-01-02 14:12:29 -08:00
Eric Anholt	0805060573	v3d: Handle dynamically uniform IF statements with uniform control flow. Loops will be trickier, since we need some analysis to figure out if the breaks/continues inside are uniform. Until we get that in NIR, this gets us some quick wins. total instructions in shared programs: 6192844 -> 6174162 (-0.30%) instructions in affected programs: 487781 -> 469099 (-3.83%)	2019-01-02 14:12:29 -08:00
Eric Anholt	5e9ee6e841	v3d: Fold comparisons for IF conditions into the flags for the IF. total instructions in shared programs: 6193810 -> 6192844 (-0.02%) instructions in affected programs: 800373 -> 799407 (-0.12%)	2019-01-02 14:12:29 -08:00
Eric Anholt	078dc176bc	v3d: Don't try to fold non-SSA-src comparisons into bcsels. There could have been a write of a src in between the comparison and the bcsel that would invalidate the comparison.	2019-01-02 14:12:29 -08:00
Eric Anholt	2e0433b687	v3d: Move the "Find the ALU instruction generating our bool" out of bcsel. This will be reused for if statements.	2019-01-02 14:12:29 -08:00
Eric Anholt	c3ae0aa264	v3d: Simplify the emission of comparisons for the bcsel optimization. I wanted to reuse the comparison stuff for nir_ifs, but for that I just want the flags and no destination value. Splitting the conditions from the destinations ended up cleaning the existing code up, anyway.	2019-01-02 14:12:29 -08:00
Eric Anholt	ad1e59cf8d	v3d: Add support for gl_HelperInvocation. We can just look at the MSF flags -- if they're unset, then we're definitely in a helper invocation. Fixes dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.	2018-12-30 08:05:11 -08:00
Eric Anholt	20021e3473	v3d: Add support for textureSize() on MSAA textures. Fixes failures in dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d in the GLES3.1 suite.	2018-12-30 08:05:11 -08:00
Eric Anholt	906fca1b4b	v3d: Add support for non-constant texture offsets. Fixes dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat and others.	2018-12-30 08:05:11 -08:00
Eric Anholt	47caefc7b4	v3d: Force sampling from base level for tg4. This is what the GLSL ES 310 spec tells us to do, but apparently the "gather mode" flag doesn't imply it in the HW. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear	2018-12-30 08:05:11 -08:00
Eric Anholt	f9bdce9966	v3d: Add a note for a potential performance win on multop/umul24. Noticed while debugging a testcase.	2018-12-30 08:05:11 -08:00
Eric Anholt	b36757448d	v3d: Dead-code eliminate unused flags updates. The greedy comparison folding in bcsel means that we may have left the original bool-generating NIR ALU instruction dead, but DCE wasn't eliminating the VIR code for it because of the flags updates. total instructions in shared programs: 5186024 -> 5100894 (-1.64%) instructions in affected programs: 1448695 -> 1363565 (-5.88%)	2018-12-30 08:05:11 -08:00
Eric Anholt	20e3526298	v3d: Don't generate temps for comparisons. This was just generated work for vir_opt_dead_code and cluttered up the dumps.	2018-12-30 08:04:54 -08:00
Eric Anholt	ebde5afb93	v3d: Move "does this instruction have flags" from sched to generic helpers. I wanted to reuse it for DCE of flags updates.	2018-12-30 08:03:51 -08:00
Eric Anholt	39b1112189	v3d: Drop incorrect dependency for flpop. It is just shifting probably-means-flags bits out of a value, it doesn't actually update the flags on its own.	2018-12-30 08:03:51 -08:00
Eric Anholt	a7c9fd7573	v3d: Drop unused count_nir_instrs() helper. This was for shader-db, but I haven't cared about NIR instruction counts in a long time.	2018-12-30 08:03:51 -08:00
Eric Anholt	696f63f1b4	v3d: Hook up some shader-db output to GL_ARB_debug_output. This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.	2018-12-30 08:03:51 -08:00
Eric Anholt	87b251a940	v3d: Add a "precompile" debug flag for shader-db. I've been using my apitrace-based shader-db so far, but it's slow (apitrace decompression), intrusive (apitrace windows spamming the screen), and doesn't have much coverage. The original shader-db provides a lot more coverage and compiles faster, at the expense of not having the actual runtime variant key. As v3d has a lot less runtime variation than vc4 did, this tradeoff makes more sense.	2018-12-29 13:52:09 -08:00
Eric Anholt	9ec6a3d621	v3d: Fix uniform pretty printing assertion failure with branches. Fixes: `248a7fb392` ("v3d: Do uniform pretty-printing in the QPU dump.")	2018-12-29 13:52:09 -08:00
Eric Anholt	d80761b8f3	v3d: Drop shadow comparison state from shader variant key. The shadow state is now in the sampler.	2018-12-20 11:29:30 -08:00
Eric Anholt	7c56b7a6ea	v3d: Add a fallthrough path for utile load/store of 32 byte lines. Now that V3D has 8 byte per pixel formats exposed, we've got stride==32 utiles to load and store. Just handle them through the non-NEON paths for now.	2018-12-19 10:27:26 -08:00
Eric Anholt	f6a0f4f41e	vc4: Move the utile load/store functions to a header for reuse by v3d. These implementations of whole-utile load/stores would be the same for v3d, though the layouts of blocks of utiles has changed.	2018-12-19 10:27:26 -08:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Eric Anholt	00e2cbc049	v3d: Fix the argument type for vir_BRANCH(). Apparently this has been spewing warnings for Jason's clang, but not my gcc.	2018-12-17 09:52:23 -08:00
Jason Ekstrand	11dc130779	nir: Add a bool to int32 lowering pass We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_$[fiu]lt$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ge$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ne$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]eq$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fi]$ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_$[fiu]lt$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ge$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ne$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]eq$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fi]$ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Eric Anholt	2977c77758	v3d: Use the original bit size when scalarizing uniform loads. Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 21:03:01 +00:00
Eric Anholt	29927e7524	v3d: Drop in a bunch of notes about performance improvement opportunities. These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.	2018-12-14 17:48:01 -08:00
Eric Anholt	248a7fb392	v3d: Do uniform pretty-printing in the QPU dump. If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.	2018-12-14 17:48:01 -08:00
Eric Anholt	532b6c5671	v3d: Move uniform pretty-printing to its own helper function. I want to reuse it in the QPU dump.	2018-12-14 17:48:01 -08:00
Eric Anholt	a7e15a5086	v3d: Avoid assertion failures when removing end-of-shader instructions. After generating VIR, we leave c->cursor pointing at the end of the shader. If the shader had dead code at the end (for example from preamble instructions in a shader with no side effects), we would assertion fail that we were leaving the cursor pointing at freed memory. Since anything following DCE should be setting up a new cursor anyway, just clear the cursor at the start.	2018-12-14 17:48:01 -08:00
Eric Anholt	5b2cc03852	v3d: Add support for draw indirect for GLES3.1. In trying to enable compute shaders, I found that a bunch of deqp-gles31's compute stuff wanted to interact with indirect dispatch. This was easy to do on its own.	2018-12-14 17:48:01 -08:00
Eric Anholt	ff80e58b38	v3d: Add missing flagging of SYNCB as a TSY op. Fixes: `f2e41daac5` ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")	2018-12-14 17:48:01 -08:00
Eric Anholt	3f9bcf9136	v3d: Make sure that a thrsw doesn't split a multop from its umul24. The thrsw will invalidate rtop, just like accumulators and flags. Caught by simulator assertions in CS imulextended/umulextended tests. Fixes: `90269ba353` ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")	2018-12-14 17:48:01 -08:00
Eric Anholt	f1d98204c3	v3d: Fix a leak of the disassembled instruction string during debug dumps. Fixes: `ade416d023` ("broadcom: Add VC5 NIR compiler.")	2018-12-07 16:48:23 -08:00
Eric Anholt	bad95bb13c	v3d: Add VIR dumping of TMU config p0/p1. I had a bit of it for V3D 3.x, but didn't update it for 4.x.	2018-12-07 16:48:23 -08:00
Eric Anholt	1fc78ff3f1	v3d: Simplify VIR uniform dumping using a temporary.	2018-12-07 16:48:23 -08:00
Eric Anholt	5932575299	v3d: Garbage collect unused uniforms code.	2018-12-07 16:48:23 -08:00
Eric Anholt	acecee4c2d	v3d: Return the right gl_SampleMaskIn[] value. It's supposed to be the dispatched sample mask for this pixel, not the GL state's sample mask.	2018-12-07 16:48:23 -08:00
Eric Anholt	6870111051	v3d: Fix a comment typo	2018-12-07 16:48:23 -08:00
Eric Anholt	ca0e4ae4bc	v3d: Convert to using nir_src_as_uint() from const_value derefs. Follows `16870de8a0` ("nir: Use nir_src_is_const and nir_src_as_* in core code") to clean up v3d.	2018-12-07 16:48:23 -08:00
Eric Anholt	d1965344ac	v3d: Re-use the wrap mode uniform on V3D 3.3.	2018-12-07 16:48:23 -08:00
Eric Anholt	42652ea51e	v3d: Use combined input/output segments. The HW apparently has some issues (or at least a much more complicated VCM calculation) with non-combined segments, and the closed source driver also uses combined I/O. Until I get the last CTS failure resolved (which does look plausibly like some VPM stomping), let's use combined I/O too.	2018-12-07 16:48:23 -08:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Dylan Baker	a999798daa	meson: Add tests to suites Meson test has a concepts of suites, which allow tests to be grouped together. This allows for a subtest of tests to be run only (say only the tests for nir). A test can be added to more than one suite, but for the most part I've only added a test to a single suite, though I've added a compiler group that includes nir, glsl, and glcpp tests. To use this you'll need to invoke meson test directly, instead of ninja test (which always runs all targets). it can be invoked as: `meson test -C builddir --suite $suitename` (meson test has addition options that are pretty useful). Tested-By: Gert Wollny <gert.wollny@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-20 09:09:22 -08:00
Kenneth Graunke	5b682143da	nir: Make nir_lower_clip_vs optionally work with variables. The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:16 -08:00
Eric Anholt	538bca78e2	v3d: Don't try to set PF flags on a LDTMU operation We need an ALU op in order to set PF. Fixes a recent assertion failure in dEQP-GLES3.functional.ubo.single_basic_type.shared.bool_vertex	2018-11-15 11:12:54 -08:00
Eric Anholt	4e1b163eed	v3d: Update the TLB config for depth writes on V3D 4.2. Fixes 311 piglit cases on the simulator.	2018-11-01 13:56:30 -07:00
Emil Velikov	986033a275	configure: allow building with python3 Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python2 chosen prior to python3 v2: use python2 by default Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-31 19:15:50 +00:00
Eric Anholt	cc54e1acf9	v3d: Use nir_remove_unused_io_vars to handle binner shader output DCE We were doing this late after nir_lower_io, but we can just reuse the core code. By doing it at this stage, we won't even set up the VS attributes as inputs, reducing our VPM size.	2018-10-30 10:46:52 -07:00
Eric Anholt	c152c79d5e	v3d: Only add output slot tracking for the current varying slot. We always emit 4 slots per slot because things like color output and position processing in the epilogue will potentially look up more values than the variable declaration had. However, when we get a .location_frac != 0, we don't want to overwrite components of the following .driver_location.	2018-10-30 10:46:52 -07:00
Eric Anholt	17c8198952	v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components. This lets us trim unused trailing components in the vertex attributes, reducing the size of our VPM allocations.	2018-10-30 10:46:52 -07:00
Eric Anholt	fc85f7cfdc	v3d: Don't rely on sorting input vars for VPM read setup. For supporting scalar VPM i/o at the NIR level, we need to do a pass over the vars to figure out how big each attribute is after DCE. Once we've done that, we can just walk over c->vattr_sizes[] instead of bothering with vars.	2018-10-30 10:46:52 -07:00
Eric Anholt	cc78676030	v3d: Split out NIR input setup between FS and VPM. They don't share much code, and I'm about to rewrite the remaining shared code for the VPM case.	2018-10-30 10:46:52 -07:00
Eric Engestrom	bb84fa146f	util: use C99 declaration in the for-loop hash_table_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Eric Anholt	8ec83dc51e	v3d: Add support for hardware pack/unpack of half floats. Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.	2018-10-15 17:16:44 -07:00
Mauro Rossi	cc3b99bb48	android: broadcom/cle: export the broadcom top level path headers Fixes the following building error in vc4 build: In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_render_cl.c:34: In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_drv.h:27: In file included from external/mesa/src/gallium/drivers/vc4/vc4_simulator_validate.h:34: In file included from external/mesa/src/gallium/drivers/vc4/vc4_context.h:39: In file included from external/mesa/src/gallium/drivers/vc4/vc4_cl.h:56: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'cle/v3d_packet_helpers.h' file not found ^~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `5b102160ae` ("broadcom/genxml: Introduce a V3D packet/struct decoder.") Cc: "18.2" <mesa-stable@lists.freedesktop.org> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2018-09-15 09:14:46 +02:00
Mauro Rossi	9158e0bd82	android: broadcom/cle: add gallium include path Fixes the following building error: In file included from external/mesa/src/broadcom/cle/v3d_decoder.c:38: In file included from external/mesa/src/broadcom/cle/v3d_packet_helpers.h:29: external/mesa/src/gallium/auxiliary/util/u_math.h:42:10: fatal error: 'pipe/p_compiler.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `5b102160ae` ("broadcom/genxml: Introduce a V3D packet/struct decoder.") Cc: "18.2" <mesa-stable@lists.freedesktop.org> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2018-09-15 09:14:42 +02:00
Mauro Rossi	3341429d74	android: broadcom/genxml: fix collision with intel/genxml header-gen macro Fixes the following building error, happening when building both intel and broadcom: Gen Header: libmesa_broadcom_genxml_32 <= v3d_packet_v21_pack.h FAILED: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h /bin/bash -c "python external/mesa/src/broadcom/cle/gen_pack_header.py \ external/mesa/src/broadcom/cle/v3d_packet_v21.xml \ > gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h" Traceback (most recent call last): File "external/mesa/src/broadcom/cle/gen_pack_header.py", line 626, in <module> p = Parser(sys.argv[2]) IndexError: list index out of range header-gen macro is already defined by Intel genxml building rules and the existing header-gen does not have the $(PRIVATE_VER) argument, infact the bash command line logged in the building error is missing exactly $(PRIVATE_VER) argument Renaming the macro as pack-header-gen in src/broadcom/Android.genxml.mk solves the building error, another possible way is to keep the gen rules commands expanded and not use the macros. Fixes: `7f80a9ff13` ("vc4: Introduce XML-based packet header generation like Intel's.") Cc: "18.2" <mesa-stable@lists.freedesktop.org> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2018-09-15 09:14:33 +02:00
Dylan Baker	80825abb5d	move u_math to src/util Currently we have two sets of functions for bit counts, one in gallium and one in core mesa. The ones in core mesa are header only in many cases, since they reduce to "#define _mesa_bitcount popcount", but they provide a fallback implementation. This is important because 32bit msvc doesn't have popcountll, just popcount; so when nir (for example) includes the core mesa header it doesn't (and shouldn't) link with core mesa. To fix this we'll promote the version out of gallium util, then replace the core mesa uses with the util version, since nir (and other non-core mesa users) can and do link with mesautils. Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-07 10:21:26 -07:00
Eric Anholt	a91b158bd9	v3d: Fix setup of the VCM cache size. There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: `1561e4984e` ("v3d: Emit the VCM_CACHE_SIZE packet.")	2018-09-07 08:11:38 -07:00
Emil Velikov	cff80b6c15	Revert "configure: allow building with python3" This reverts commit `ae7898dfdb`. Turns out the python scripts are _not_ fully python 3 compatible. As Ilia reported using get_xmlpool.py with LANG=C produces some weird output - see the link for details. Even though the issue was spotted with the autoconf build, it exposes a genuine problem with the script (and lack of lang handling of the meson build.) https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html	2018-08-24 11:14:15 +01:00
Emil Velikov	ae7898dfdb	configure: allow building with python3 Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python3 chosen prior to python2 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-23 17:00:13 +01:00
Mathieu Bridon	2ee1c86d71	meson: Build with Python 3 Now that all the build scripts are compatible with both Python 2 and 3, we can flip the switch and tell Meson to use the latter. Since Meson already depends on Python 3 anyway, this means we don't need two different Python stacks to build Mesa. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-10 15:15:09 -07:00
Eric Anholt	1561e4984e	v3d: Emit the VCM_CACHE_SIZE packet. This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	50a8713d4f	v3d: Avoid spilling that breaks the r5 usage after a ldvary. Fixes bad rendering when forcing 2 spills in glxgears. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	f2c0d310d6	v3d: Make sure that QPU instruction-has-a-dest matches VIR. Found when debugging register spilling -- we would try to spill the dest of a STVPMV, inserting spill code after entering the last segment. In fact, we were likely to to choose to do this, given that the STVPMV "dest" temp was never read from, making it cheap to spill. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	3f9cb2eb05	v3d: Wait for TMU writes to complete before continuing after a spill. The simulator complained that we had write responses outstanding at shader end. It seems that a TMU read does not guarantee that previous TMU writes by the thread have completed, which surprised me. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	ccbe33af5b	v3d: Make sure we don't emit a thrsw before the last one finished. Found while forcing some spilling, which creates a lot of short tmua->thrsw->ldtmu sequences. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	f9d54dc3cf	v3d: Add some debug code for forcing register spilling. This is useful for periodically testing out register spilling to see how it goes on simple shaders, rather than only failing on insanely complicated ones.	2018-08-06 13:03:23 -07:00
Eric Anholt	c2eab33b08	v3d: Actually put the "%s" in the snprintf. I missed an important part when porting the change over, fixing my compiler warning but breaking -Werror=format-security. Fixes: `e6ff5ac446` ("v3d: use snprintf(..., "%s", ...) instead of strncpy") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107443	2018-08-01 11:39:19 -07:00
Eric Anholt	e6ff5ac446	v3d: use snprintf(..., "%s", ...) instead of strncpy Fixes a compiler warning about terminator NUL, based on `f836d799f9` ("intel/decoder: use snprintf(..., "%s", ...) instead of strncpy")	2018-07-31 16:42:11 -07:00

1 2 3 4 5 ...

491 Commits