mirrors/mesa - Frog Git

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	3515c0e9cf	intel/fs: Allow UB, B, and HF types in brw_nir_reduction_op_identity Because byte immediates aren't a thing on GEN hardware, we return a signed or unsigned word immediate in the byte case. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 18:02:15 +00:00
Paulo Zanoni	10532c6831	intel/fs: don't forget the stride at generate_shuffle During generate_shuffle(), when we use byte sized registers we end up with a destination stride of 2. We don't take the stride into consideration when selecting the group offset for the last MOV operation, which means we end up moving things to the wrong place, leaving the last few channels untouched. Take the destination stride in consideration so we don't miss the last channels. v2: Assert this is not necessary for the IVB special case (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 10:57:05 -07:00
Jason Ekstrand	dae33052db	util/rb_tree: Reverse the order of comparison functions The new order matches that of the comparison functions accepted by the C standard library qsort() functions. Being consistent with qsort will hopefully help avoid developer confusion. The only current user of the red-black tree is aub_mem.c which is pretty easy to fix up. Reviewed-by: Lionel Landwerlin <lionel.g.lndwerlin@intel.com>	2019-09-20 17:37:25 +00:00
Jason Ekstrand	d35d7346d2	util/rb_tree: Add the unit tests When I wrote the red-black tree implementation, I wrote tests for it but they never got imported into mesa. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-20 17:37:25 +00:00
Eric Engestrom	3c1a24de07	anv: implement ICD interface v4 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-09-20 08:31:58 +00:00
Eric Engestrom	19db95e78e	anv: split instance dispatch table This effectively breaks the instance dispatch table in 2 with entry points using a physical device as first argument getting their own dispatch table. As a result we now have to check instance & physical device dispatch table instead of just the instance dispatch table before. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-09-20 08:31:58 +00:00
Adam Jackson	88b8922f57	glx: Fix drawable lookup bugs in glXUseXFont We were using the current drawable of the context to name the appropriate screen for creating the bitmaps. But one, the current drawable can be None, and two, it can be a GLXDrawable. Passing either one as the second argument to XCreatePixmap will throw BadDrawable. Use the root window of the context's screen instead. Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/89 LOLed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-09-19 21:06:01 -04:00
Adam Jackson	b4fe0b3ffd	glx: Avoid atof() when computing the server's GLX version atof() is locale-dependent (sigh), which means 1.3 becomes 1.0 if the locale's decimal separator isn't a full-stop. Just use the protocol major/minor instead. This would be slightly broken if the server generically implements 1.3+ but a particular screen is only capable of less, but in practice no such servers exist. Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/74 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-09-19 20:50:01 -04:00
Ian Romanick	317a88b920	nir/algebraic: Additional D3D Boolean optimization I observed this pattern in several shaders in Hand of Fate 2 while investigating bugzilla #111490. This also led to the related bugzilla #111578. The shaders from HoF2 are not in shader-db. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Skylake and Ice Lake had similar results. (Ice Lake shown) total instructions in shared programs: 16222621 -> 16205419 (-0.11%) instructions in affected programs: 798418 -> 781216 (-2.15%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35 helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09% 95% mean confidence interval for instructions value: -33.22 -29.56 95% mean confidence interval for instructions %-change: -3.11% -2.56% Instructions are helped. total cycles in shared programs: 364676209 -> 363345763 (-0.36%) cycles in affected programs: 112810504 -> 111480058 (-1.18%) helped: 546 HURT: 7 helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340 helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08% HURT stats (abs) min: 2 max: 770 x̄: 238.00 x̃: 43 HURT stats (rel) min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35% 95% mean confidence interval for cycles value: -2884.33 -1927.41 95% mean confidence interval for cycles %-change: -1.59% -1.21% Cycles are helped. total spills in shared programs: 8870 -> 8514 (-4.01%) spills in affected programs: 1230 -> 874 (-28.94%) helped: 161 HURT: 0 total fills in shared programs: 21901 -> 21348 (-2.52%) fills in affected programs: 2120 -> 1567 (-26.08%) helped: 155 HURT: 5 Broadwell and Haswell had similar results. (Broadwell shown) total instructions in shared programs: 14994910 -> 14975495 (-0.13%) instructions in affected programs: 839033 -> 819618 (-2.31%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49 helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22% 95% mean confidence interval for instructions value: -37.46 -33.40 95% mean confidence interval for instructions %-change: -3.12% -2.70% Instructions are helped. total cycles in shared programs: 386032453 -> 384450722 (-0.41%) cycles in affected programs: 117807357 -> 116225626 (-1.34%) helped: 547 HURT: 6 helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926 helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31% HURT stats (abs) min: 4 max: 60 x̄: 32.83 x̃: 29 HURT stats (rel) min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65% 95% mean confidence interval for cycles value: -3060.28 -2660.27 95% mean confidence interval for cycles %-change: -1.59% -1.37% Cycles are helped. total spills in shared programs: 23372 -> 21869 (-6.43%) spills in affected programs: 11730 -> 10227 (-12.81%) helped: 352 HURT: 0 total fills in shared programs: 34747 -> 35351 (1.74%) fills in affected programs: 11013 -> 11617 (5.48%) helped: 3 HURT: 347 Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956420 -> 11956126 (<.01%) instructions in affected programs: 14898 -> 14604 (-1.97%) helped: 98 HURT: 0 helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00% 95% mean confidence interval for instructions value: -3.00 -3.00 95% mean confidence interval for instructions %-change: -2.18% -1.98% Instructions are helped. total cycles in shared programs: 178791217 -> 178790792 (<.01%) cycles in affected programs: 149763 -> 149338 (-0.28%) helped: 91 HURT: 7 helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16 helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18% HURT stats (abs) min: 3 max: 322 x̄: 207.43 x̃: 322 HURT stats (rel) min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41% 95% mean confidence interval for cycles value: -18.94 10.27 95% mean confidence interval for cycles %-change: -1.28% 0.49% Inconclusive result (value mean confidence interval includes 0).	2019-09-19 14:22:22 -07:00
Ian Romanick	92f70df8c3	nir/algebraic: Do not apply late DPH optimization in vertex processing stages Some shaders do not use 'invariant' in vertex and (possibly) geometry shader stages on some outputs that are intended to be invariant. For various reasons, this optimization may not be fully applied in all shaders used for different rendering passes of the same geometry. This can result in Z-fighting artifacts (at best). For now, disable this optimization in these stages. In tessellation stages applications seem to use 'precise' when necessary, so allow the optimization in those stages. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490 Fixes: `09705747d7` ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16194726 -> 16344745 (0.93%) instructions in affected programs: 2855172 -> 3005191 (5.25%) helped: 6 HURT: 20279 helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1 helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44% HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7 HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56% 95% mean confidence interval for instructions value: 7.34 7.45 95% mean confidence interval for instructions %-change: 8.48% 8.67% Instructions are HURT. total cycles in shared programs: 364471296 -> 365014683 (0.15%) cycles in affected programs: 32421530 -> 32964917 (1.68%) helped: 2925 HURT: 16144 helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5 helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15% HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15 HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87% 95% mean confidence interval for cycles value: 21.58 35.41 95% mean confidence interval for cycles %-change: 4.36% 4.52% Cycles are HURT.	2019-09-19 14:21:31 -07:00
Andres Gomez	bcd9224728	docs/features: Update VK_KHR_display_swapchain status It was set as done by mistake. Fixes: `bc15d74529` ("docs/features: Mark some Vulkan extensions as done") Signed-off-by: Andres Gomez <agomez@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 23:45:17 +03:00
Andres Gomez	53c24cfd8a	docs/features: Update status list of Vulkan extensions To get the extension list: $ git grep -hE "extension name=\"VK_KHR" src/vulkan/registry/vk.xml \| \ grep -v disabled \| awk '{print $2}' \| sed -E 's/(name=)?"//g' \| sort To find anv(il) and radv supported extensions: $ git grep -hE "'VK_([A-Z]+)_[a-z,0-9]" src/intel/ $ git grep -hE "'VK_([A-Z]+)_[a-z,0-9]" src/amd/ v2: - Keep VK_KHR_device_group and VK_KHR_device_group_creation as not started (Jason). Signed-off-by: Andres Gomez <agomez@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 23:39:26 +03:00
Jason Ekstrand	0c4e89ad5b	Move blob from compiler/ to util/ There's nothing whatsoever compiler-specific about it other than that's currently where it's used. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-19 19:56:22 +00:00
Boris Brezillon	fc5a87715a	Revert "panfrost: Rework midgard_pair_load_store() to kill the nested foreach loop" There's a missing prev_ldst = NULL; assignment in the new logic, but even with this fixed it seems to regress some applications, so let's revert the change until we find the real problem. This reverts commit `c9bebae287`.	2019-09-19 21:01:27 +02:00
Caio Marcelo de Oliveira Filho	fa080f03d3	intel/fs: Add Fall-through comment Reviewed-by: Andres Gomez <agomez@igalia.com>	2019-09-19 10:02:16 -07:00
Samuel Iglesias Gonsálvez	5ed5e76741	nir/algebraic: refactor inexact opcode restrictions Refactor the code to avoid calling a lot of time to auxiliary functions when it is not really needed. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-09-19 18:57:27 +02:00
Adam Jackson	5b5c5bf833	docs: Update bug report URLs for the gitlab migration Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-09-19 16:37:36 +00:00
Bas Nieuwenhuizen	ec76232785	glx: Remove redundant null check. Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/64 Reviewed-by: Adam Jackson <ajax@redhat.com>	2019-09-19 15:11:10 +00:00
Kenneth Graunke	706c9f2d60	iris: Skip double-disabling TCS/TES/GS after BLORP operations BLORP always turns off TCS/TES/GS. If regular drawing also has them disabled (the overwhelmingly common case), then leaving them disabled is just fine by us and we can skip dirtying them, as that would just re-disable them a second time on the next draw. If they are actually enabled, however, we do need to flag them. Cuts 52% of the 3DSTATE_HS packets in an Aztec Ruins trace.	2019-09-19 07:56:15 -07:00
Erik Faye-Lund	7f7060dc73	.mailmap: add an alias for Frank Binns Reviewed-by: Frank Binns <frank.binns@imgtec.com>	2019-09-19 16:41:10 +02:00
Erik Faye-Lund	c1b1e0e875	.mailmap: add an alias for Bas Nieuwenhuizen Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 16:39:10 +02:00
Arcady Goldmints-Orlov	5ec5fecc26	anv: fix descriptor limits on gen8 Later generations support bindless for samplers, images, and buffers and thus per-stage descriptors are not limited by the binding table size. However, gen8 doesn't support bindless images and thus needs to report a lower per-stage limit so that all combinations of descriptors that fit within the advertised limits are reported as supported by vkGetDescriptorSetLayoutSupport. Fixes test dEQP-VK.api.maintenance3_check.descriptor_set Fixes: `79fb0d27f3` ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-19 09:10:40 -05:00
Daniel Schürmann	8b78cce433	radv: remove dead shared variables LLVM does this anyway, but for ACO we need to do it in NIR. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00
Daniel Schürmann	281262281b	radv/aco: enable VK_EXT_shader_demote_to_helper_invocation For now, this extension will only be enabled for ACO. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00
Daniel Schürmann	e01b522a72	radv: enable clustered reductions These work with both, LLVM and ACO. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00
Daniel Schürmann	a70a998718	radv/aco: Setup alternate path in RADV to support the experimental ACO compiler LLVM remains default and ACO can be enabled with RADV_PERFTEST=aco. Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Co-authored-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00
Daniel Schürmann	93c8ebfa78	aco: Initial commit of independent AMD compiler ACO (short for AMD Compiler) is a new compiler backend with the goal to replace LLVM for Radeon hardware for the RADV driver. ACO currently supports only VS, PS and CS on VI and Vega. There are some optimizations missing because of unmerged NIR changes which may decrease performance. Full commit history can be found at https://github.com/daniel-schuermann/mesa/commits/backend Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Co-authored-by: Rhys Perry <pendingchaos02@gmail.com> Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Co-authored-by: Connor Abbott <cwabbott0@gmail.com> Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com> Co-authored-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00
Tapani Pälli	99cbec0a5f	egl: check for NULL value like eglGetSyncAttribKHR does Commit `d1e1563bb6` added a NULL check for eglGetSyncAttribKHR but eglGetSyncAttrib does not do this. Patch adds same check to happen with eglGetSyncAttrib. Fixes crashes in (when exposing EGL 1.5): dEQP-EGL.functional.fence_sync.invalid.get_invalid_value Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-09-19 06:39:33 +00:00
Kenneth Graunke	a16975e615	iris: Rework iris_update_draw_parameters to be more efficient This improves a couple of things: 1. We now only update anything if the shader actually cares. Previously, is_indexed_draw was causing us to flag dirty vertex buffers, elements, and SGVs every time the shader switched between indexed and non-indexed draws. This is a very common situation, but we only need that information if the shader uses gl_BaseVertex. We were also flagging things when switching between indirect/direct draws as well, and now we only bother if it matters. 2. We upload new draw parameters only when necessary. When we detect that the draw parameters have changed, we upload a new copy, and use that. Previously we were uploading it every time the vertex buffers were dirty (for possibly unrelated reasons) and the shader needed that info. Tying these together also makes the code a bit easier to follow. In Civilization VI's benchmark, this code was flagging dirty state many times per frame (49 average, 16 median, 614 maximum). Now it occurs exactly once for the entire run.	2019-09-18 22:50:52 -07:00
Kenneth Graunke	6841f11d14	iris: Use state_refs for draw parameters. iris_state_ref is a <resource, offset> tuple, which is exactly what we need here.	2019-09-18 22:50:52 -07:00
Timothy Arceri	ddd314f0ce	util/disk_cache: make use of the total job size limiting feature This makes use of the total job size limiting feature added in the previous patch. The idea is to avoid an excessive build up in memory use due to the use of both the UTIL_QUEUE_INIT_RESIZE_IF_FULL and UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY flags. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-19 15:03:27 +10:00
Timothy Arceri	896885025f	util/u_queue: track job size and limit the size of queue growth When both UTIL_QUEUE_INIT_RESIZE_IF_FULL and UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY are set, we can get into a situation where the queue never executes and grows to a huge size due to all other threads being busy. This is the case with the shader cache when attempting to compile a huge number of shaders up front. If all threads are busy compiling shaders the cache queues memory use can climb into the many GBs very fast. The use of these two flags with the shader cache is intended to allow shaders compiled at runtime to be compiled as fast as possible. To avoid huge memory use but still allow the queue to perform optimally in the run time compilation case, we now add the ability to track memory consumed by the jobs in the queue and limit it to a hardcoded 256MB which should be more than enough. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-19 15:03:27 +10:00
Timothy Arceri	a2ee29c3da	util/disk_cache: bump thread count assigned to disk cache queue Since we set the UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY flag this should have little impact on low core systems. However just about all modern CPUs currently available that run Mesa have at least 4 cores. For these CPUs allowing more threads can result in the queue being processed faster and avoid excessive memory use due to a backlog of cache entrys building up in the queue. This change helps avoid a huge build up of cache entrys in the queue due to using both the UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY and UTIL_QUEUE_INIT_RESIZE_IF_FULL flags. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-19 15:03:27 +10:00
Paulo Zanoni	8e614c7a29	intel/fs: fix SHADER_OPCODE_CLUSTER_BROADCAST for SIMD32 The current code can create functions with a width of 32, which is not supported by our hardware. Add some code to simplify how we express what we want and prevent such cases. For some unknown reason, all the tests I could run seem to work even with these unsupported MOVs. Fixes: `b0858c1cc6` "intel/fs: Add a couple of simple helper opcodes" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:48:27 +00:00
Paulo Zanoni	c99df52873	intel/fs: the maximum supported stride width is 16 There are cases where we try to generate registers with a stride of 32, while the hardware maximum is just 16. This happens, for example, when using 8 bit integers on SIMD32. This results in a crash because the variable 'width' has a value of 32: ../../src/intel/compiler/brw_reg.h:550: brw_reg brw_vecn_reg(unsigned int, brw_reg_file, unsigned int, unsigned int): Assertion `!"Invalid register width"' failed. This change prevents the crash and makes the tests pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:48:27 +00:00
Paulo Zanoni	cebf447d16	intel/fs: roll the loop with the <0,1,0> additions in emit_scan() IMHO the code is easier to understand this way, being explicit that we're doing exactly the same thing every time. No functional changes. v2: Adjust the loop breaking condition (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:47:17 +00:00
Paulo Zanoni	d9ddf5076d	intel/fs: make scan/reduce work with SIMD32 when it fits 2 registers When dealing with uint16_t and uint8_t on SIMD32 we can do all the operations using just 2 registers, so we don't hit the recursion at the beginning of emit_scan(). Because of that, we need to actually compute scan/reduce for channels 31:16. v2: Still missed instructions (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:47:17 +00:00
Kristian H. Kristensen	7f07046dbc	freedreno/regs: A couple of tess updates Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-18 16:59:10 -07:00
Kristian H. Kristensen	a2031a117c	freedreno/regs: Fix CP_DRAW_INDX_OFFSET command On A5xx+ the INDX_BASE pointer is 64 bit. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-18 16:59:10 -07:00
Kristian H. Kristensen	2251a4345b	freedreno/a6xx: Write multiple regs for SP_VS_OUT_REG and SP_VS_VPC_DST_REG Compute the number of writes up front. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-18 16:59:10 -07:00
Kristian H. Kristensen	cc4fe81145	freedreno/a6xx: Turn on vectorize_io We want this for tessellation eventually, but we can turn it on now. Shader-db results: total instructions in shared programs: 8612905 -> 8611387 (-0.02%) instructions in affected programs: 164952 -> 163434 (-0.92%) total dwords in shared programs: 11952000 -> 11950560 (-0.01%) dwords in affected programs: 68096 -> 66656 (-2.11%) total full in shared programs: 315019 -> 315009 (<.01%) full in affected programs: 1642 -> 1632 (-0.61%) total constlen in shared programs: 2463654 -> 2463654 (0.00%) constlen in affected programs: 0 -> 0 total (ss) in shared programs: 152379 -> 152409 (0.02%) (ss) in affected programs: 1503 -> 1533 (2.00%) total (sy) in shared programs: 96473 -> 96525 (0.05%) (sy) in affected programs: 654 -> 706 (7.95%) total max_sun in shared programs: 1172454 -> 1172472 (<.01%) max_sun in affected programs: 104 -> 122 (17.31%) Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-18 16:59:10 -07:00
Kristian H. Kristensen	1cb9534434	freedreno/a6xx: Share shader state constructor and destructor Also, swap vs and fs constructor or so fs comes first. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-18 16:59:10 -07:00
Kristian H. Kristensen	be38480064	freedreno/a6xx: Track location of gl_Position out as we link it When using xfb and rasterizing, the fragment shader may have fewer inputs than the vertex shader outputs. We can't rely on gl_Position to be placed at fs->total_in, but have to instead remember where we add it in the link map and use that location. Fixes 100+ tesselation dEQPs under dEQP-GLES31.functional.tessellation.primitive_discard.* dEQP-GLES31.functional.tessellation.user_defined_io.* Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-18 16:59:10 -07:00
Caio Marcelo de Oliveira Filho	d38e0a6326	spirv: Add missing break for capability handling New added cases "stole" the previous break. Fixes: `420ad0a1a3` ("spirv: check support for SPV_KHR_float_controls capabilities") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-09-18 15:49:14 -07:00
Kenneth Graunke	3da8a8a3d6	iris: Avoid uploading SURFACE_STATE descriptors for UBOs if possible If we can entirely push uniform data, we don't need a SURFACE_STATE descriptor for pulling data. Since constant uploads are a very common operation, and being able to push all data is also very common, we would like to avoid the overhead in this case. This patch defers uploading new descriptors. Instead of handling that at iris_set_constant_buffer, we do it at iris_update_compiled_shaders, where we can see the currently bound shader variants. If any need pull descriptors, and descriptors are missing, we update them and flag that the binding table also needs to be refreshed. Improves performance in GFXBench5 gl_driver2 on an i7-6770HQ by 31.9774% +/- 1.12947% (n=15). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	0e4a75f917	intel/compiler: Record whether any pull constant loads occur I would like for iris to be able to avoid setting up SURFACE_STATE for UBOs in the common case where all constants are pushed. Unfortunately, we don't know up front whether everything will be pushed: the backend is allowed to demote pushed UBOs to pull loads fairly late in the process. This is probably desirable though, as we'd like the backend to be able to re-pull pushed data to break up long live ranges in response to register pressure. Here we simply add a "are there any pull loads at all" boolean to prog_data, which is a bit crude but at least allows us to skip work in the common "everything pushed" case. We could skip more work by tracking exactly which UBO surfaces are pulled in a bitmask, but I wanted to avoid bringing back the old mark_surface_used() mechanism. Finer-grained tracking could allow us to skip a bit more work when multiple UBOs are in use and /some/ are 100% pushed, but others are accessed via pulls. However, I'm not sure how common this is and it would save at most 4 pull descriptors, so we defer that for now. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	dd83ef0d1a	iris: Track per-stage bind history, reduce work accordingly We now track per-stage bind history for constant and shader buffers, shader images, and sampler views by adding an extra res->bind_stages field to go with res->bind_history. This lets us flag IRIS_DIRTY_CONSTANTS for only the specific stages involved, and also skip some CPU overhead in iris_rebind_buffer. Cuts 4% of 3DSTATE_CONSTANT_XS packets in a Shadow of Mordor trace on Icelake. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	1e7daaa6c9	iris: Don't flag IRIS_DIRTY_BINDINGS for constant usage history The underlying buffer isn't changing - so we don't need to update any SURFACE_STATE descriptors - we just might have new constants, meaning we need to re-emit 3DSTATE_CONSTANT_XS. On Gen9, this means we need to update 3DSTATE_BINDING_TABLE_POINTERS_XS too, but that's now handled by the explicit check in the previous patch. On Gen9, this should cause us to re-emit the binding table /pointer/ on writing to a buffer with PIPE_BIND_CONSTANT_BUFFER, rather than emitting a whole new /table/. On Gen8 and Gen11, this avoids binding table churn altogether. Cuts 61% of 3DSTATE_BINDING_TABLE_POINTERS_XS packets in a Shadow of Mordor trace on Icelake. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	e7db3577f8	iris: Explicitly emit 3DSTATE_BTP_XS on Gen9 with DIRTY_CONSTANTS_XS Right now, we usually flag both IRIS_DIRTY_{CONSTANTS,BINDINGS}_XS, because we have SURFACE_STATE for constant buffers in case the shaders access them via pull mode. But this flagging is overkill in many cases. Gen8 and Gen11 don't need it at all. Gen9 doesn't need that large of a hammer in all cases. Just handle it explicitly so the right thing happens. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	caa0aebd01	iris: Flag IRIS_DIRTY_BINDINGS_XS on constant buffer rebinds We upload a new SURFACE_STATE for the UBO/SSBO in question, which means that we need new binding tables as well. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00

... 3 4 5 6 7 ...

115751 Commits All Branches Search

115751 Commits

All Branches