mirrors/mesa - Frog Git

Commit Graph

Author	SHA1	Message	Date
Neil Roberts	ee61790daf	freedreno: Remove the Emacs mode lines These are not necessary because the corresponding settings are set via the .dir-locals.el file anyway. Most of them were missing a ‘:’ after “tab-width” which was making Emacs display an annoying warning whenever you open the file. This patch was made with: sed -ri '/-\- mode:/,/^$/d' \ $(find src/gallium/{drivers,winsys} -name \.\[ch\] \ -exec grep -l -- '-\*- mode:' {} \+) Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Neil Roberts	afe640b360	freedreno: Fix the Emacs indentation configuration file The .dir-locals.el had the wrong name for the truthy value so it wasn’t setting indent-tabs-mode. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Hyunjun Ko	8e798e28f7	freedreno: allocate batches from the cache in launch_grid Needs to allocate batches from the cache so that it could get a valid index and make resource dependancy tracking right. In addition this fixes assertion on debug build since the commit `1a40faa8` landed. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Hyunjun Ko	2385d7b066	freedreno: adds nondraw param to fd_bc_alloc_batch Needs to specify nondraw when creating a batch through fd_bc_alloc_batch since it'd better create a batch through it rather than fd_batch_create. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	9e6019bd46	freedreno/a6xx: remove fd6_emit_render_cntl() It was dead code carried over from a5xx Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	835cb06965	freedreno/ir3: fix broken texcoord inputs TODO not sure if this is best solution, but current logic is broken for texcoord inputs. It is definitely the simplest solution. Fixes: `1a24f51966` freedreno/ir3: ignore unused inputs Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	cbf9fe50b5	freedreno: fix off-by-one error in BEGIN_RING() Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Marek Olšák	669dd22983	util: document a limitation of util_fast_udiv32 trivial	2018-10-17 12:27:58 -04:00
Matt Turner	58a51d0a67	i965/fs: Add 64-bit int immediate support to dump_instructions() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-16 17:48:17 -07:00
Marek Olšák	fcc70e4855	radeonsi: track context rolls better for the Vega scissor bug workaround We should get fewer context rolls with the SET_CONTEXT_REG optimization, but it would have been for nothing if the scissor state rolled the context anyway. Don't emit the scissor state if there is no context roll.	2018-10-16 17:23:25 -04:00
Marek Olšák	25ddb15cfe	radeonsi: emit sample locations for 1xAA only when the hw bug is present	2018-10-16 17:23:25 -04:00
Marek Olšák	9b331e462e	radeonsi: use compute shaders for clear_buffer & copy_buffer Fast color clears should be much faster. Also, fast color clears on evicted buffers should be 200x faster on GFX8 and older.	2018-10-16 17:23:25 -04:00
Marek Olšák	5030adcbe0	radeonsi: use copy_buffer in buffer_do_flush_region directly	2018-10-16 17:23:25 -04:00
Marek Olšák	0b40fbc879	radeonsi: use faster integer division for instance divisors We know the divisors when we upload them, so instead we can precompute and upload division factors derived from each divisor. This fast division consists of add, mul_hi, and two shifts, and we have to load 4 dwords intead of 1. This probably won't affect any apps.	2018-10-16 17:23:25 -04:00
Marek Olšák	bfc795670e	ac: add helpers for fast integer division by a constant	2018-10-16 17:23:25 -04:00
Marek Olšák	ea039f789d	radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports	2018-10-16 15:28:22 -04:00
Marek Olšák	4fd8d2df9c	radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardband We'll modify the quant mode there, which also affects the guarband computation.	2018-10-16 15:28:22 -04:00
Marek Olšák	41a6c3de1f	radeonsi: don't re-upload the sample position constant buffer repeatedly	2018-10-16 15:28:22 -04:00
Marek Olšák	b94824c787	radeonsi: set PA_SU_PRIM_FILTER_CNTL optimally	2018-10-16 15:28:22 -04:00
Marek Olšák	9e182b8313	radeonsi: center viewport to improve guardband clipping for high resolutions This will be more useful when we change the quant mode to increase subpixel precision and decrease the viewport range (which might not be possible if the viewport is not centered in the viewport range).	2018-10-16 15:28:22 -04:00
Marek Olšák	fedc1fda30	radeonsi: save raster config in screen, add se_tile_repeat	2018-10-16 15:28:22 -04:00
Marek Olšák	ac76aeef20	radeonsi: switch back to standard DX sample positions Apps may rely on them.	2018-10-16 15:28:22 -04:00
Marek Olšák	67f02cf810	radeonsi: add GDS support to CP DMA	2018-10-16 15:28:22 -04:00
Marek Olšák	0d05581578	radeonsi: rename si_gfx_* functions to si_cp_* and write_event_eop -> release_mem	2018-10-16 15:28:22 -04:00
Marek Olšák	6e1cf6532d	radeonsi: make si_gfx_write_event_eop more configurable	2018-10-16 15:28:22 -04:00
Sergii Romantsov	0fa9e6d7b3	anv/skylake: disable ForceThreadDispatchEnable On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang. -v2: enabling of ForceThreadDispatchEnable is only for gen8, for gen9 and higher reverted enabling of PixelShaderHasUAV. -v3 (Jason Ekstrand): Rework the comments a bit. CC: Jason Ekstrand <jason.ekstrand@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760 Fixes: `79270d2140` (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV) Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-16 13:20:51 -05:00
Lionel Landwerlin	322a919a41	anv: Implement VK_EXT_pci_bus_info Even though the Intel GPU are always at the same PCI location, all the info we need is already provided by libdrm. Let's be future proof. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-16 12:47:55 +01:00
Jose Fonseca	8550be7a2f	appveyor: Cache pip's cache files. It should speed up the Python packages installation. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-16 09:41:14 +01:00
Jose Fonseca	bfb8afb14d	appveyor: Update to newer Mako/winflexbison versions. As that's what most people are bound to use. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-16 09:41:12 +01:00
Jose Fonseca	b94f9cd8f9	appveyor: Update to MSVC 2017. That's what we (and I suppose most people out there) are using now. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-16 09:41:07 +01:00
Samuel Pitoiset	647c2b90e9	radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT This feature isn't used for now, so disable it until wwm is fixed in LLVM. Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal* https://bugs.freedesktop.org/show_bug.cgi?id=108115 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-16 10:24:19 +02:00
Samuel Pitoiset	593996bc02	radv: implement buffer to image operations for R32G32B32 This should fix rendering issues with Batman Arkham City. We will probably need to implement itob and itoi at some point, but currently nothing hits these paths. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-16 09:22:38 +02:00
Alex Smith	ca83d51cfb	ac/nir: Use context-specific LLVM types LLVMInt*Type() return types from the global context and therefore are not safe for use in other contexts. Use types from our own context instead. Fixes frequent crashes seen when doing multithreaded pipeline creation. Fixes: `4d0b02bb5a` "ac: add support for 16bit load_push_constant" Fixes: `7e7ee82698` "ac: add support for 16bit buffer loads" Cc: "18.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-16 08:18:24 +01:00
Vadym Shovkoplias	ad558408ff	glsl: Check the subroutine associated functions names Adding compile time check for subroutine functions with the same names. Similar check for intrastage linking was already landed in commit `5f0567a4f6`. From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification "A program will fail to compile or link if any shader or stage contains two or more functions with the same name if the name is associated with a subroutine type." Fixes: * no-overloads.vert Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109 Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-16 08:15:21 +03:00
Vadym Shovkoplias	d2ea3d4a76	glsl/linker: Change the format of spec quotation Also there is no "OpenGL ES Shading Language 4.00" spec, so change it to GLSL 4.00 spec. Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-16 08:15:21 +03:00
Dave Airlie	ff281e6204	nir: fix clip cull lowering to not assert if GLSL already lowered. If GLSL has already done the lowering, we'd rather not crash in this pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-15 18:53:48 -07:00
Kenneth Graunke	5bd8369681	i965: Add PCI IDs for new Amberlake parts that are Coffeelake based See commit c0c46ca461f136a0ae1ed69da6c874e850aeeb53 in the Linux kernel, where José Roberto de Souza added this new PCI ID there. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2018-10-15 18:10:27 -07:00
Kenneth Graunke	8f8111646c	intel: disable FS IR validation in release mode. We probably don't need to iterate, fprintf, and abort in release mode. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-10-15 18:10:27 -07:00
Caio Marcelo de Oliveira Filho	b3c6146925	nir: Copy propagation between blocks Extend the pass to propagate the copies information along the control flow graph. It performs two walks, first it collects the vars that were written inside each node. Then it walks applying the copy propagation using a list of copies previously available. At each node the list is invalidated according to results from the first walk. This approach is simpler than a full data-flow analysis, but covers various cases. If derefs are used for operating on more memory resources (e.g. SSBOs), the difference from a regular pass is expected to be more visible -- as the SSA copy propagation pass won't apply to those. A full data-flow analysis would handle more scenarios: conditional breaks in the control flow and merge equivalent effects from multiple branches (e.g. using a phi node to merge the source for writes to the same deref). However, as previous commentary in the code stated, its complexity 'rapidly get out of hand'. The current patch is a good intermediate step towards more complex analysis. The 'copies' linked list was modified to use util_dynarray to make it more convenient to clone it (to handle ifs/loops). Annotated shader-db results for Skylake: total instructions in shared programs: 15105796 -> 15105451 (<.01%) instructions in affected programs: 152293 -> 151948 (-0.23%) helped: 96 HURT: 17 All the HURTs and many HELPs are one instruction. Looking at pass by pass outputs, the copy prop kicks in removing a bunch of loads correctly, which ends up altering what other other optimizations kick. In those cases the copies would be propagated after lowering to SSA. In few HELPs we are actually helping doing more than was possible previously, e.g. consolidating load_uniforms from different blocks. Most of those are from shaders/dolphin/ubershaders/. total cycles in shared programs: 566048861 -> 565954876 (-0.02%) cycles in affected programs: 151461830 -> 151367845 (-0.06%) helped: 2933 HURT: 2950 A lot of noise on both sides. total loops in shared programs: 4603 -> 4603 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 11085 -> 11073 (-0.11%) spills in affected programs: 23 -> 11 (-52.17%) helped: 1 HURT: 0 The shaders/dolphin/ubershaders/12.shader_test was able to pull a couple of loads from inside if statements and reuse them. total fills in shared programs: 23143 -> 23089 (-0.23%) fills in affected programs: 2718 -> 2664 (-1.99%) helped: 27 HURT: 0 All from shaders/dolphin/ubershaders/. LOST: 0 GAINED: 0 The other generations follow the same overall shape. The spills and fills HURTs are all from the same game. shader-db results for Broadwell. total instructions in shared programs: 15402037 -> 15401841 (<.01%) instructions in affected programs: 144386 -> 144190 (-0.14%) helped: 86 HURT: 9 total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> `7687588` (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) Properly merge the component information in written map instead of replacing. (Jason) Explain how removal from written arrays works. (Jason) Use mode directly from deref instead of getting the var. (Jason) v4: Register the local written mode for calls. (Jason) Prefer cf_node instead of node. (Jason) Clarify that remove inside iteration only works in backward iterations. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	dc349f07b5	nir: Take call instruction into account in copy_prop_vars Calls are not used yet (functions are inlined), but since new code is already taking them into account, do it here too. The convention here and in other places is that no writable memory is assumed to remain unchanged, as well as global variables. Also, explicitly state the modes affected (instead of using the reverse logic) in one of the apply_for_barrier_modes calls. Suggested by Jason. v2: Consider local vars used by a call to be conservative, SPIR-V has such cases. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	797f01c220	nir: Add tests for copy propagation of derefs Also tests for removal of redundant loads, that we currently handle as part of the copy propagation. Note some tests involve multiple blocks and are currently DISABLED because they (expectedly) fail. v2: Add missing DISABLED prefix to "multi block" tests. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	4dfa7adc10	nir: Remove handling of dead writes from copy_prop_vars These are covered by another pass now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	c20dd1f77c	intel/nir, freedreno/ir3: Use the separated dead write vars pass No changes to shader-db for intel. No changes to shader-db expected for freedreno. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	cb126cf67a	nir: Separate dead write removal into its own pass Instead of doing this as part of the existing copy_prop_vars pass. Separation makes easier to expand the scope of both passes to be more than per-block. For copy propagation, the information about valid copies comes from previous instructions; while the dead write removal depends on information from later instructions ("have any instruction used this deref before overwrite it?"). Also change the tests to use this pass (instead of copy prop vars). Note that the disabled tests continue to fail, since the standalone pass is still per-block. v2: Remove entries from dynarray instead of marking items as deleted. Use foreach_reverse. (Caio) (all from Jason) Do not cache nir_deref_path. Not worthy for this patch. Clear unused writes when hitting a call instruction. Clean up enumeration of modes for barriers. Move metadata calls to the inner function. v3: For copies, use the vector length to calculate the mask. (all from Jason) Use nir_component_mask_t when applicable. Rename functions for clarity. Consider local vars used by a call to be conservative (SPIR-V has such cases). Comment and assert the assumption that stores and copies are always to a deref that ends with a vector or scalar. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	a02fd7000d	nir: Add tests for dead write elimination Note at the moment the pass called is nir_opt_copy_prop_vars, because dead write elimination is implemented there. Also added tests that involve identifying dead writes in multiple blocks (e.g. the overwrite happens in another block). Those currently fail as expected, so are marked to be skipped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	bbda2a17f7	nir: Add test file for vars related passes Add basic helpers for doing tests on the vars related optimization passes. The main goal is to lower the barrier to create tests during development and debugging of the passes. Full coverage is not a requirement. v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason) Move nir_imm_ivec2() to nir_builder.h. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	c869646b7d	nir: Add nir_imm_ivec2 helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	3966f053a1	util: Add foreach_reverse for dynarray Useful to walk the array removing elements by swapping them with the last element. v2: Change iteration to make sure we never underflow. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Eric Anholt	8ec83dc51e	v3d: Add support for hardware pack/unpack of half floats. Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.	2018-10-15 17:16:44 -07:00
Eric Anholt	7d77fe1bcc	nir: Expose nir_remove_unused_io_vars(). For gallium drivers where you want to do some linking at variant compile time, you don't have the other producer/consumer shader on hand to modify. By exposing the inner function, the driver can have the used varyings in the compiled shader cache key and still do linking. This is also useful for V3D, where the binning shader wants to only output position and TF varyings. We've been removing those after nir_lower_io, but this will be less driver-specific code and let more of the shader get DCEed early in NIR. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-15 17:16:44 -07:00

... 4 5 6 7 8 ...

105437 Commits All Branches Search

105437 Commits

All Branches