KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Caio Marcelo de Oliveira Filho	cb26d9c311	intel/fs: Add helper to get prog_offset and simd_size This indirection will be used by the variable group size case in a later change. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>	2020-05-27 18:16:31 -07:00
Caio Marcelo de Oliveira Filho	4b000b491a	intel/fs: Add an option to lower variable group size in backend Adding this since Iris will handle variable group size parameters by itself. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:28 -07:00
Caio Marcelo de Oliveira Filho	0edb58a84e	intel/fs: Clean up variable group size handling in backend Just use the information from NIR shader_info. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:28 -07:00
Jason Ekstrand	d0d039a4d3	anv: Emit pushed UBO bounds checking code in the back-end compiler This commit fixes performance regressions introduced by `e03f965280` in which we started bounds checking our push constants. This added a LOT of shader code to shaders which use the robustBufferAccess feature and led to substantial spilling. The checking we just added to the FS back-end is far more efficient for two reasons: 1. It can be done at a whole register granularity rather than per- scalar and so we emit one SIMD8 SEL per 32B GRF rather than one SIMD16 SEL (executed as two SELs) for each component loaded. 2. Because we do it with NoMask instructions, we can do it on whole pushed GRFs without splatting them out to SIMD8 or SIME16 values. This means that robust buffer access no longer explodes our register pressure for no good reason. As a tiny side-benefit, we're now using can use AND instead of SEL which means no need for the flag and better scheduling. Vulkan pipeline database results on ICL: Instructions in all programs: 293586059 -> 238009118 (-18.9%) SENDs in all programs: 13568515 -> 13568515 (+0.0%) Loops in all programs: 149720 -> 149720 (+0.0%) Cycles in all programs: 88499234498 -> 84348917496 (-4.7%) Spills in all programs: 1229018 -> 184339 (-85.0%) Fills in all programs: 1348397 -> 246061 (-81.8%) This also improves the performance of a few apps: - Shadow of the Tomb Raider: +4% - Witcher 3: +3.5% - UE4 Shooter demo: +2% Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4447>	2020-04-17 14:48:06 +00:00
Jason Ekstrand	e003104605	intel: Add _const versions of prog_data cast helpers Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4597>	2020-04-16 17:26:16 +00:00
Jason Ekstrand	b2e4157143	anv: Advertise SEND count through VK_EXT_pipeline_executable_properties Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4578>	2020-04-15 21:51:55 +00:00
Caio Marcelo de Oliveira Filho	db74ad0696	intel/compiler: Remove cs_prog_data->threads At this point all drivers are doing this math on their own -- since most of them need to cover the variable group size case, in which at compile time the group size (and number of threads) is not defined. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>	2020-04-09 19:23:20 -07:00
Plamena Manolova	c77dc51203	intel/compiler: Add support for variable workgroup size Add new builtin parameters that are used to keep track of the group size. This will be used to implement ARB_compute_variable_group_size. The compiler will use the maximum group size supported to pick a suitable SIMD variant. A later improvement will be to keep all SIMD variants (like FS) so the driver can select the best one at dispatch time. When variable workgroup size is used, the small workgroup optimization is disabled as it we can't prove at compile time that the barriers won't be needed. Extracted from original i965 patch with additional changes by Caio Marcelo de Oliveira Filho. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>	2020-04-09 19:23:12 -07:00
Caio Marcelo de Oliveira Filho	c54fc0d07b	intel/compiler: Replace cs_prog_data->push.total with a helper The push.total field had three values but only one was directly used (size). Replace it with a helper function that explicitly takes the cs_prog_data and the number of threads -- and use that in the drivers. This is a preparation for ARB_compute_variable_group_size where the number of threads (hence the total size for push constants) is not defined at compile time (not cs_prog_data->threads). Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>	2020-04-09 19:23:12 -07:00
Caio Marcelo de Oliveira Filho	395de69b1f	intel/fs: Allow multiple slots for position Change brw_compute_vue_map() to also take the number of pos slots. If more than one slot is used, the VARYING_SLOT_POS is treated as an array. When using Primitive Replication, instead of a single position, the VUE must contain an array of positions. Padding might be necessary (after clip distance) to ensure rest of attributes start aligned. v2: Add note about array in the commit message and assert that pos_slots >= 1 to make clear 0 is invalid. (Jason) Move padding to be after the clip distance. v3: Apply the correct offset when gathering the sources from outputs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>	2020-04-07 17:16:09 +00:00
Juan A. Suarez Romero	460de2159e	intel/compiler: store the FS inputs in WM prog data Store the fragment shader inputs in the program data so we can use them later when required without needing the NIR shader. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2010>	2020-04-01 23:36:28 +00:00
Sagar Ghuge	1a5ac646ce	intel/compiler: Track patch count threshold Return the number of patches to accumulate before an 8_PATCH mode thread is launched. v2: (Kenneth Graunke) - Track patch count threshold instead of input control points. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3563>	2020-03-23 17:57:57 +00:00
Tapani Pälli	e8f0483ec4	intel/compiler: detect if atomic load store operations are used Patch adds a new arg and modifies existing calls from i965, anv pass NULL but iris stores this information for later use. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4080>	2020-03-16 10:34:21 +00:00
Mathias Fröhlich	630154e77b	i965: Move down genX_upload_sbe in profiles. Avoid looping over all VARYING_SLOT_MAX urb_setup array entries from genX_upload_sbe. Prepare an array indirection to the active entries of urb_setup already in the compile step. On upload only walk the active arrays. v2: Use uint8_t to store the attribute numbers. v3: Change loop to build up the array indirection. v4: Rebase. v5: Style fix. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>	2020-03-10 14:28:36 +00:00
Francisco Jerez	57dee58c82	intel/fs: Set src0 alpha present bit in header when provided in message payload. Currently the "Source0 Alpha Present to RenderTarget" bit of the RT write message header is derived from brw_wm_prog_data::replicate_alpha. However the src0_alpha payload is provided anytime it's specified to the logical message. This could theoretically lead to an inconsistency if somebody provided a src0_alpha value while brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in a future commit in order to implement a hardware workaround. Instead calculate the header bit based on whether a src0_alpha value was provided to the logical message, which guarantees the same behavior on pre-ICL and ICL+ (the latter used an extended descriptor bit for this which didn't suffer from the same issue). Remove the brw_wm_prog_data::replicate_alpha flag. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-02-14 14:31:48 -08:00
Francisco Jerez	0dd18d70ae	intel/fs/gen6: Generalize aligned_pairs_class to SIMD16 aligned barycentrics. This is mainly meant to avoid shader-db regressions on SNB as we start using VGRFs for barycentrics more frequently. Currently the aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16 mode barycentric vectors are typically 4 GRFs. This is not a problem on Gen4-5, because on those platforms all VGRF allocations are pair-aligned in SIMD16 mode. However on Gen6 we end up using either the fast or the slow path of LINTERP rather non-deterministically based on the behavior of the register allocator. Fix it by repurposing aligned_pairs_class to hold PLN-aligned registers of whatever the natural size of a barycentric vector is in the current dispatch width. On SNB this prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series: total instructions in shared programs: 13983257 -> 14527274 (3.89%) instructions in affected programs: 1766255 -> 2310272 (30.80%) helped: 0 HURT: 11608 LOST: 26 GAINED: 13 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:34 -08:00
Jason Ekstrand	d1c4e64a69	intel/compiler: Add a flag to avoid compacting push constants In vec4, we can just not run the pass. In fs, things are a bit more deeply intertwined. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Francisco Jerez	b0e69d115e	intel/ir/gen12: Update assert in brw_stage_has_packed_dispatch(). Confirmed no regressions after a full Piglit run on TGL with the brw_fs_test_dispatch_packing() test enabled. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Kenneth Graunke	0e4a75f917	intel/compiler: Record whether any pull constant loads occur I would like for iris to be able to avoid setting up SURFACE_STATE for UBOs in the common case where all constants are pushed. Unfortunately, we don't know up front whether everything will be pushed: the backend is allowed to demote pushed UBOs to pull loads fairly late in the process. This is probably desirable though, as we'd like the backend to be able to re-pull pushed data to break up long live ranges in response to register pressure. Here we simply add a "are there any pull loads at all" boolean to prog_data, which is a bit crude but at least allows us to skip work in the common "everything pushed" case. We could skip more work by tracking exactly which UBO surfaces are pulled in a bitmask, but I wanted to avoid bringing back the old mark_surface_used() mechanism. Finer-grained tracking could allow us to skip a bit more work when multiple UBOs are in use and /some/ are 100% pushed, but others are accessed via pulls. However, I'm not sure how common this is and it would save at most 4 pull descriptors, so we defer that for now. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Jason Ekstrand	f58e0405b6	intel/fs: Drop the gl_program from fs_visitor It's not used by anything anymore now that so much lowering has been moved into NIR. Sadly, we still need on in brw_compile_gs() for geometry shaders on Sandy Bridge. Short of a lot of pointless work, that one's probably not going away. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-25 01:02:52 -05:00
Jason Ekstrand	8d3cbd0393	intel/fs: Add SLM size to brw_cs_prog_data We don't need it for state setup but it's a useful statistic we want to pass on to developers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	134607760a	intel/compiler: Fill a compiler statistics struct This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Timothy Arceri	2afedfaf9a	iris: add support for gl_ClipVertex in tess eval shaders Required for OpenGL compat support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-01 16:12:37 -07:00
Timothy Arceri	00b5bf2d72	iris: add support for gl_ClipVertex in geometry shaders This will enable us to support the OpenGL compat profile. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-01 16:12:27 -07:00
Eric Engestrom	abc226cf41	tree-wide: replace MAYBE_UNUSED with ASSERTED Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Jason Ekstrand	2a236c76f8	intel/compiler: Allow for required subgroup sizes Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	4397eb91c1	intel/compiler: Allow for varying subgroup sizes Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	c84b8eeeac	intel/compiler: Be more conservative about subgroup sizes in GL The rules for gl_SubgroupSize in Vulkan require that it be a constant that can be queried through the API. However, all GL requires is that it's a uniform. Instead of always claiming that the subgroup size in the shader is 32 in GL like we have to do for Vulkan, claim 8 for geometry stages, the maximum for fragment shaders, and the actual size for compute. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	14781e2122	intel/compiler: Add a "base class" for program keys Right now, all keys have two things in common: a program string ID and a sampler_prog_key_data. I'd like to add another thing or two and need a place to put it. This commit adds a new brw_base_prog_key struct which contains those two common bits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-10 19:35:55 +00:00
Kenneth Graunke	419d9b21e1	intel: Move brw_prog_key_set_id from i965 to the compiler. I want to use it in iris. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Kenneth Graunke	646924cfa1	intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8 Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:30 -07:00
Kenneth Graunke	fad7801afd	i965: Move program key debugging to the compiler. The i965 driver has a bunch of code to compare two sets of program keys and print out the differences. This can be useful for debugging why a shader needed to be recompiled on the fly due to non-orthogonal state dependencies. anv doesn't do recompiles, so we didn't need to share this in the past - but I'd like to use it in iris. This moves the bulk of the code to the compiler where it can be reused. To make that possible, we need to decouple it from i965 - we can't get at the brw program cache directly, nor use brw_context to print things. Instead, we use compiler->shader_perf_log(), and simply pass in keys. We put all of this debugging code in brw_debug_recompile.c, and only export a single function, for simplicity. I also tidied the code a bit while moving it, now that it all lives in one file. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-16 09:01:15 -07:00
Danylo Piliaiev	c8abe03f3b	i965,iris,anv: Make alpha to coverage work with sample mask From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) \| 0x0808 * (m & 2) \| 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <currojerez@riseup.net> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-03-25 13:54:55 -07:00
Ian Romanick	bbf20a1ca3	intel/compiler: Silence unused parameter warning in brw_interpolation_map.c The parameter is never used, and it's not part of a common interface idiom. Remove it. src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’: src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter] const struct gen_device_info *devinfo) ^~~~~~~ Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:36 -08:00
Kasireddy, Vivek	7cab8d3661	i965: Add support for sampling from XYUV images Add support to the i965 DRI driver to sample from XYUV8888 buffers. Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:52 +00:00
Kenneth Graunke	286b8b8f99	iris: handle PatchVerticesIn as a system value.	2019-02-21 10:26:10 -08:00
Tapani Pälli	3da858a6b9	intel/compiler: add scale_factors to sampler_prog_key_data Patch propagates given scale_factors to lowering options. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-12 08:42:25 +02:00
Kenneth Graunke	04c2f12ab2	i965: Drop mark_surface_used mechanism. The original idea was that the backend compiler could eliminate surfaces, so we would have it mark which ones are actually used, then shrink the binding table accordingly. Unfortunately, it's a pretty blunt mechanism - it can only prune things from the end, not the middle - since we decide the layout before we even start the backend compiler, and only limit the size. It also basically gives up if it sees indirect array access. Besides, we do the vast majority of our surface elimination in NIR anyway, not the backend - and I don't see that trend changing any time soon. Vulkan abandoned this plan a long time ago, and I don't use it in Iris, but it's still been kicking around in i965. I hacked shader-db to print the binding table size in bytes, and observed no changes with this patch. So, this code appears to do nothing useful. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-13 09:35:32 -08:00
Kenneth Graunke	562448b75a	i965: Do NIR shader cloning in the caller. This moves nir_shader_clone() to the driver-specific compile function, rather than the shared src/intel/compiler code. This allows i965 to do key-specific passes before calling brw_compile_*. Vulkan should not need this cloning as it doesn't compile multiple variants. We do need to continue cloning in the compute shader code because we lower various things in NIR based on the SIMD width. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-11-20 15:53:46 -08:00
Lionel Landwerlin	89785e2d56	i965: add support for sampling from AYUV Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-12 13:22:54 +00:00
Jason Ekstrand	d8033d4083	intel/compiler: Remove surface_idx from brw_image_param Now that the drivers are lowering to surface indices themselves, we no longer need to push the surface index into the shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jordan Justen	8fcdb71d8c	intel/compiler: Add brw_get_compiler_config_value for disk cache During code review, Jason pointed out that: `2b3064c073` "i965, anv: Use INTEL_DEBUG for disk_cache driver flags" Didn't account for INTEL_SCALER_* environment variables. To fix this, let the compiler return the disk_cache driver flags. Another possible fix would be to pull the INTEL_SCALER_* into INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I didn't think it was a good use of 4 more of these bits. (5 since INTEL_PRECISE_TRIG needs to be accounted for as well.) Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-01 23:49:16 -07:00
Jason Ekstrand	b2e0b0dad6	anv/pipeline: More aggressively optimize away color attachments Instead of just looking at the number of color attachments, look at which ones are actually used by the subpass. This lets us potentially throw away chunks of the fragment shader. In DXVK, for example, all subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this is very helpful in that case. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-01 18:02:28 -07:00
Jason Ekstrand	06412bfc98	anv,intel: Enable nir_opt_large_constants for Vulkan According to RenderDoc, this shaves 99.6% of the run time off of the ambient occlusion pass in Skyrim Special Edition when running under DXVK and shaves 92% off the runtime for a reasonably representative frame. When running the actual game, Skyrim goes from being a slide-show to a very stable and playable framerate on my SKL GT4e machine. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-02 12:09:50 -07:00
Jason Ekstrand	d5e028a57b	intel/fs: Add fields to wm_prog_data for SIMD32 dispatch Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	244a0ff3a8	i965: Add plumbing for shader time in 32-wide FS dispatch mode. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	0b830081f0	intel/fs: Rework KSP data to be SIMD width-based Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	9d78abbef8	intel/compiler: Add and use helpers for working with KSP indices The pixel shader dispatch table is kind-of a confusing mess. This adds some helpers for dealing with it and for easily extracting the correct data from wm_prog_data. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Antia Puentes	3a1df14a7b	intel: activate the gl_BaseVertex lowering Surplus code related to the basevertex is removed. The Vertex Elements contain now: * VE 1: <firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <DrawID, is_indexed_draw, 0, 0> Also fixes unreachable message. Fixes OpenGL CTS tests: * KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters * KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters * KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters * KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters * KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters Fixes Piglit tests: * arb_shader_draw_parameters-drawid-indirect baseinstance * arb_shader_draw_parameters-basevertex Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678	2018-05-02 11:24:46 +02:00
Antia Puentes	6ba9088d9c	intel/compiler: Add uses_is_indexed_draw flag Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-02 11:20:48 +02:00

1 2

87 Commits