mirrors/mesa - Frog Git

Commit Graph

Author	SHA1	Message	Date
Neha Bhende	2cff6f4512	svga: Allow DXPredCopyRegion for depth_and_stencil formats. DXPredCopyRegion supports copy between src and dst for depth_and_stencil formats if src and dst have same formats. tested ith piglit v2: As per Brian's comment, allow DXPredCopyRegion for depth+stencil buffers if the blit mask is PIPE_MASK_ZS. Tested with piglit tests and added new piglit test arb_framebuffer_object-depth-stencil-blit to test this particular testcase. Reviewed-by: Brian Paul <brianp@vmware.com>	2016-11-03 14:29:22 -06:00
Neha Bhende	9a9627a791	svga: fix memory leak in svga_clear_texture() Piglit tests which uses arb_clear_texture extension, have memory leak issue. pipe_surface created in svga_clear_texture() was not deleted which happens to be the cause for memory leak. tested all arb_clear_texture-* piglit tests with valgrid. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-11-03 14:29:22 -06:00
Thomas Hellstrom	d787ce7288	svga: Implement the pipe clear_render_target functionality v2 v2: Accounted for the fact that svga_try_clear_render_target also honors conditional rendering. Testing done: Excercised all functions in a separate feature branch. Forced emission of conditional rendering commands when necessary. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-11-03 14:29:22 -06:00
Charmaine Lee	76f5f76468	svga: add SVGA_3D_CMD_INVALIDATE_GB_SURFACE support This command will be used in a subsequent patch to invalidate a surface. Reviewed-by: Brian Paul <brianp@vmware.com>	2016-11-03 14:29:22 -06:00
Nicolai Hähnle	27bd9c0f0a	pipe-loader: add libamd_common for radeonsi This fixes a build regression of commit `7115e56c21`. Sorry for the breakage, this second location for link dependencies escaped my build tests. Bugzilla: https://patchwork.freedesktop.org/patch/119816/ Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2016-11-03 16:54:55 +01:00
Nicolai Hähnle	908f92ad1f	radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and others. This leaves the case of triangle strips with adjacency and primitive restarts open. It seems that the only thing that cares about that is a piglit test. Fixing this efficiently would be really involved, and I don't want to use the hammer of degrading to software handling of indices because there may well be software that uses this draw mode (without caring about the precise rotation of triangles). v2: - skip the GS prolog entirely if workaround is not needed - only check for TES (TES is always non-null when tessellation is used) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:11:24 +01:00
Nicolai Hähnle	ffe4e829b0	radeonsi: remove si_shader_context::is_gs_copy_shader It has become redundant. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:53 +01:00
Nicolai Hähnle	3b2516721b	radeonsi: make the GS copy shader owned by the GS selector The copy shader only depends on the selector. This change avoids creating separate code paths for monolithic vs. non-monolithic geometry shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:50 +01:00
Nicolai Hähnle	9c6f7d66dc	radeonsi: si_shader_vs only depends on the GS selector Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:48 +01:00
Nicolai Hähnle	693435d846	radeonsi: si_vgt_gs_mode only depends on the selector Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:45 +01:00
Nicolai Hähnle	2e1fb7e7fc	radeonsi: make si_generate_gs_copy_shader usable as a standalone function It really only depends on the shader selector. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:42 +01:00
Nicolai Hähnle	ba5de0d034	radeonsi: unify the si_compile_* functions for prologs and epilogs Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:37 +01:00
Nicolai Hähnle	aa9583b0da	radeonsi: get rid of no_{prolog,epilog} Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:34 +01:00
Nicolai Hähnle	75503b1904	radeonsi: get rid of si_llvm_emit_fs_epilogue It is no longer used. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:31 +01:00
Nicolai Hähnle	611510038a	radeonsi: get rid of get_interp_param Replace by a simple LLVMGetParam, since ctx->no_prolog is always false. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:29 +01:00
Nicolai Hähnle	3f4439b6ba	radeonsi: get rid of select_interp_param The condition !ctx->no_prolog is now always true. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:26 +01:00
Nicolai Hähnle	858ac2228f	radeonsi: use TCS epilog for monolithic shaders For fixed function TCS, we keep the copying of VS outputs to TES inputs inside the main function; the call to si_copy_tcs_inputs is moved accordingly. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:23 +01:00
Nicolai Hähnle	3f1be54e53	radeonsi: extract si_build_tcs_epilog_function Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:20 +01:00
Nicolai Hähnle	be6e31c6a0	radeonsi: use VS epilog for monolithic TES Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:17 +01:00
Nicolai Hähnle	06dcb2d2a9	radeonsi: use VS prolog and epilog for monolithic shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:14 +01:00
Nicolai Hähnle	f9daa2f470	radeonsi: extract si_build_vs_{prolog,epilog}_function Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:12 +01:00
Nicolai Hähnle	6f37e992a3	radeonsi: use PS prolog for monolithic shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:09 +01:00
Nicolai Hähnle	15dd332e6a	radeonsi: set num_input_vgprs for fragment shaders in create_function So that the prolog generated for monolithic fragment shaders will have the right signature. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:05 +01:00
Nicolai Hähnle	fec7ced211	radeonsi: extract si_build_ps_prolog_function Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:02 +01:00
Nicolai Hähnle	7115e56c21	radeonsi: use PS epilog for monolithic shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:07:00 +01:00
Nicolai Hähnle	bf86c56594	radeonsi: extract si_build_ps_epilog_function Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:57 +01:00
Nicolai Hähnle	0b9bba7f6c	radeonsi: pass the function name to si_llvm_create_func We will use multiple functions in one module, so they should have different names. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:54 +01:00
Nicolai Hähnle	96d60dd9ee	radeonsi: split is_monolithic into no_prolog and no_epilog This helps to achieve a gradual transition towards building monolithic shaders via inlining. no_prolog and no_epilog will be removed by the end of the series, separate_prolog remains in use to control the PS input mapping. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:50 +01:00
Nicolai Hähnle	8db9d915cd	radeonsi: free data structures when shader compiles fail Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:47 +01:00
Nicolai Hähnle	4c1504af6a	radeonsi: move main TGSI translation into its own function The idea is that adding prolog and epilog code will be pulled out into the caller. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:44 +01:00
Nicolai Hähnle	23dfb688ba	radeonsi: add always-inline pass to si_llvm_finalize_module Change the pass manager as well, since this is a module-level pass. No noticeable run-time difference on shader-db. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:42 +01:00
Nicolai Hähnle	4ada1dabc4	radeonsi: fix signature of export intrinsic in VS epilog The incompatible signature becomes an issue when the VS epilog gets merged with the main vertex shader at the IR level. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:33 +01:00
Nicolai Hähnle	899b2f24a4	radeonsi: link against amd_common Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-03 10:06:30 +01:00
Samuel Pitoiset	548b5fee6b	nv50,nvc0: stop limiting the number of active queries to 1 This limitation was initially here because AMD_performance_monitor doesn't allow to expose the real number of hardware counters. But this actually really annoying when profiling with qapitrace. Anyways, performance counters are mostly for developers and failures are expected if you try to monitor more queries than supported. This breaks amd_performance_monitor_measure but it's expected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-11-02 23:42:09 +01:00
Samuel Pitoiset	b6137f226c	nvc0: add new warp_nonpred_execution_efficiency metric on SM35 Event not_predicated_off_thread_inst_executed is SM35+. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-02 23:35:49 +01:00
Samuel Pitoiset	98a382d013	nvc0: add missing metric-issue_slot on SM35 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-02 23:35:46 +01:00
Samuel Pitoiset	c32d7175aa	nvc0: do not expose metric-inst_issued twice on SM35 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-02 23:35:44 +01:00
Samuel Pitoiset	524703da58	nvc0: add new warp_execution_efficiency metric on SM30+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-02 23:35:42 +01:00
Samuel Pitoiset	51fe48660a	nvc0: respect 80-chars for perf metrics descriptions Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-02 23:35:39 +01:00
Samuel Pitoiset	b58d85bac8	nvc0: sort performance metrics alphabetically Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-02 23:35:28 +01:00
Samuel Pitoiset	1d75d681d3	nv50: add missing draw_calls_indexed driver stat Spotted when glancing at the VBO push code. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-02 21:11:57 +01:00
Nicolai Hähnle	5aef14932a	radeonsi: fix BFE/BFI lowering for GLSL semantics Fixes spec/arb_gpu_shader5/execution/built-in-functions/*-bitfield{Extract,Insert} Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-02 12:30:11 +01:00
Nicolai Hähnle	6526977306	tgsi: align the definition of BFI & [UI]BFE with GLSL As previously written, these opcodes use the SM5 semantics which is incompatible with GLSL when bits == 0, offset == 32. At some point we may want to add BFI_SM5 etc. opcodes, but all users currently either want (and expect!) the GLSL semantics or don't care. Bitfield inserts are generated by the GLSL lower_instructions and lower_packing_builtins passes with constant bits and offset arguments, so any workaround code that drivers may have to emit to follow GLSL semantics should be optimized away easily for those uses. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-02 12:30:07 +01:00
Marek Olšák	7786f8c635	gallium/radeon: add enum radeon_micro_mode Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	1a4e0162fc	gallium/radeon: make it clear that DRM 2.x.x fast clear constraint is CIK-only Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	e3697b4be6	gallium/radeon: remove r600_surface::level_info Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	bf4d102ea3	gallium/radeon: add radeon_surf::is_linear Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	e9c76eeeaa	gallium/radeon: remove radeon_surf_level::pitch_bytes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	c66a550385	gallium/radeon: don't call u_format helpers if we have that info already Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	692f2640ab	gallium/radeon: replace radeon_surf_info::dcc_enabled with num_dcc_levels Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	315eb0acb4	radeonsi: add a driver query for counting CP DMA calls CP DMA calls are synchronous with regard to shaders, but can be made asynchronous if needed. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Marek Olšák	d268b7f95e	radeonsi: add a driver query for shader cache hits This is an 8-month old patch. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-01 22:33:13 +01:00
Samuel Pitoiset	8bfd65395e	nvc0: do not duplicate similar performance metrics Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2016-11-01 19:03:26 +01:00
Leo Liu	06e3cd6a45	st/omx/dec: disable tunnel for size different case When the video coded size is different from frame size, we need the result buffers are same as coded size, which are not size compatible with encode required size, so that simply use no tunnel for this case instead of frame by frame converting. Signed-off-by: Leo Liu <leo.liu@amd.com> Cc: 13.0 <mesa-stable@lists.freedesktop.org>	2016-10-31 11:45:29 -04:00
Leo Liu	d9b2c4048d	st/omx/dec: result buffers size should match codec decoder size Otherwise fails the check of matching between decoder size and buffers size in kernel. Signed-off-by: Leo Liu <leo.liu@amd.com> Cc: 13.0 <mesa-stable@lists.freedesktop.org>	2016-10-31 11:45:14 -04:00
George Kyriazis	55fb874376	swr: [rasterizer] added EventHandlerFile contructor Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-10-31 09:06:29 -05:00
George Kyriazis	0a5811b0f3	swr: [rasterizer core] Frontend dependency work Add frontend dependency concept in the DRAW_CONTEXT, which allows serialization of frontend work if necessary. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-10-31 09:06:21 -05:00
George Kyriazis	06f93d0329	swr: [rasterizer core] Refactor/cleanup backends Used for common code reuse and simplification Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-10-31 09:06:15 -05:00
George Kyriazis	78a0a09e48	swr: [rasterizer core] Remove deprecated simd intrinsics Used in abandoned all-or-nothing approach to converting to AVX512 Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-10-31 09:06:08 -05:00
George Kyriazis	1a3ed86348	swr: [rasterizer archrast] Add thread tags to event files. This allows the post-processor to easily detect the API thread and to process frame information. The frame information is needed to optimized how data is processed from worker threads. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-10-31 09:05:25 -05:00
Marek Olšák	52d2b28f7f	ralloc: use rzalloc where it's necessary No change in behavior. ralloc_size is equivalent to rzalloc_size. That will change though. Calls not switched to rzalloc_size: - ralloc_vasprintf - glsl_type::name allocation (it's filled with snprintf) - C++ classes where valgrind didn't show uninitialized values I switched most of non-glsl stuff to rzalloc without checking whether it's really needed. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-31 11:53:38 +01:00
Serge Martin	cb0879985a	clover: Implement clGetExtensionFunctionAddressForPlatform. Add clGetExtensionFunctionAddressForPlatform (CL 1.2). Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-10-30 12:53:03 -07:00
Vedran Miletić	2fba72046d	clover: Introduce CLOVER_EXTRA__OPTIONS environment variables The options specified in the CLOVER_EXTRA_BUILD_OPTIONS shell variable are appended to the options specified by the OpenCL program in the clBuildProgram function call, if any. Analogously, the options specified in the CLOVER_EXTRA_COMPILE_OPTIONS and CLOVER_EXTRA_LINK_OPTIONS variables are appended to the options specified in clCompileProgram and clLinkProgram function calls, respectively. v2: rename to CLOVER_EXTRA_COMPILER_OPTIONS * use debug_get_option * append to linker options as well v3: code cleanups v4: separate CLOVER_EXTRA_LINKER_OPTIONS options v5: * fix documentation typo * use CLOVER_EXTRA_COMPILER_OPTIONS in link stage v6: * separate in CLOVER_EXTRA_{BUILD,COMPILE,LINK}_OPTIONS * append options in cl{Build,Compile,Link}Program Signed-off-by: Vedran Miletić <vedran@miletic.net> Reviewed-by[v1]: Edward O'Callaghan <funfunctor@folklore1984.net> v7 [Francisco Jerez]: Slight simplification. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-10-30 12:45:26 -07:00
Vedran Miletić	e3272865c2	clover: Pass unquoted compiler arguments to Clang OpenCL apps can quote arguments they pass to the OpenCL compiler, most commonly include paths containing spaces. If the Clang OpenCL compiler was called via a shell, the shell would split the arguments with respect to to quotes and then remove quotes before passing the arguments to the compiler. Since we call Clang as a library, we have to split the argument with respect to quotes and then remove quotes before passing the arguments. v2: move to tokenize(), remove throwing of CL_INVALID_COMPILER_OPTIONS v3: simplify parsing logic, use more C++11 v4: restore error throwing, clarify a comment Signed-off-by: Vedran Miletić <vedran@miletic.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-10-30 12:14:59 -07:00
Marek Olšák	4bf45a6079	radeonsi: fix behavior of GLSL findLSB(0) 12.0 and older need the same fix but elsewhere. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-29 01:17:36 +02:00
Marek Olšák	e24dc43164	radeonsi: set VGT_GS_ONCHIP_CNTL on CIK and later Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>	2016-10-29 01:17:36 +02:00
Samuel Pitoiset	84e946380b	nvc0/ir: fix emission of IMAD with NEG modifiers The emitter tried to emit sub instead of subr when src0 has actually a NEG modifier. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 12.0 13.0" <mesa-stable@lists.freedesktop.org>	2016-10-27 19:29:56 +02:00
Samuel Pitoiset	1ec7227d44	nvc0/ir: fix emission of SHLADD with NEG modifiers This affects GF100:GK110 chipsets, but not GM107+ where the logic is a bit different. The emitters tried to emit sub instead of subr when src0 has a NEG modifier. This fixes the following piglit tests glsl-fs-loop-nested and glsl-vs-loop-nested. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "13.0" <mesa-stable@lists.freedesktop.org>	2016-10-26 22:18:04 +02:00
Marek Olšák	ad45dce4a2	radeonsi: remove si_resource_create_custom Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	29144d0f34	gallium/radeon: stop using PIPE_BIND_CUSTOM it has no effect whatsoever Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	e3c3a7fada	r600g: remove a redundant buffer_create helper Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	3dc78c33a9	gallium/radeon: remove unused r600_cmask_info members Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	d18bf0b944	gallium/radeon: don't force the same tiling parameters for FMASK GCN can use a completely different tile mode for FMASK. FMASK allocation now skips one unrelated amdgpu_surface_init codepath as hinted by the assertion. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	ecf045b4f7	winsys/amdgpu: allocate FMASK properly I expect no change in behavior, because r600_texture.c forces the same tile mode as the base texture has. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	24faeb94be	gallium/radeon: print tiling index when printing texture info Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	37659071b8	gallium/radeon: don't do (fmask.size && cmask.size) fmask implies that cmask is present too. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	2664351dfe	gallium/radeon: re-order radeon_surf::dcc and htile members Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	2a2e537577	gallium/radeon: rename bo_size -> surf_size, bo_alignment -> surf_alignment these names were misleading. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	67a44c97af	gallium/radeon: remove flags specific to libdrm_radeon from winsys interface These just say whether libdrm can assume that the latest radeon_surface definition is used by Mesa. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	7a706ad25c	gallium/radeon: remove r600_htile_info Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	7e73ff87c0	gallium/radeon: remove unnecessary fields from radeon_surf_level Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	d5c7ea3b83	gallium/radeon: decrease the size of radeon_surf Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	e9590d9092	gallium/radeon: pass pipe_resource and other params to surface_init directly This removes input-only parameters from the radeon_surf structure. Some of the translation logic from pipe_resource to radeon_surf is moved to winsys/radeon. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	8b94976df9	radeon/vce: use nblk_y instead of npix_y npix_y will be removed. level[0].npix_y will be removed too. nblk_y should be the same as npix_y if the block height == 1. However, nblk_y is aligned to the tile size, so it can be greater than npix_y. If that's a problem, we'll have to save the input height of surface_init and use that. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	ba174b8dff	gallium/radeon: define RADEON_SURF_MODE_* as enums Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	b5118fe054	gallium/radeon: stop using some input fields from radeon_surface Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	28d237d63d	gallium/radeon: fold r600_setup_surface into r600_init_surface Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	b0d8a717a7	winsys/amdgpu: remove unused definitions Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	81a95946da	gallium/radeon: fold radeon_winsys::surface_best into radeon/winsys Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	dc6bbe2dd0	gallium/radeon: use r600_gfx_write_event_eop everywhere Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	462e3cdf3b	gallium/radeon: make r600_gfx_write_fence more generic Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	edf56fb428	gallium/radeon: fix a ZPASS comment, EVENT_WRITE_EOP fixups Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	d883c83ba9	radeonsi: enable SDMA on Carrizo and all CIK chips again SDMA might be fixed by: "winsys/amdgpu: fix radeon_surf::macro_tile_index for imported textures" Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	6ec3b2a4b1	winsys/amdgpu: fix radeon_surf::macro_tile_index for imported textures Maybe this is why SDMA has been broken for many amdgpu users? SDMA is the only block which is used with imported textures and relies on this variable. DB also uses it, but it doesn't get imported textures, so it's unaffected. I do get SDMA failures on Tonga before this patch if R600_DEBUG=testdma is changed to use imported textures. Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	dce05b3423	gallium/radeon: make sure the address of separate CMASK is aligned properly This should fix random GPU hangs on Hawaii and Fiji. Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Marek Olšák	8a21f52d73	gallium/radeon: fix incorrect bpe use in si_set_optimal_micro_tile_mode Oh my god, I wonder what catastrophic issues this was causing on SI. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-26 13:02:58 +02:00
Timothy Arceri	e1af20f18a	nir/i965/anv/radv/gallium: make shader info a pointer When restoring something from shader cache we won't have and don't want to create a nir_shader this change detaches the two. There are other advantages such as being able to reuse the shader info populated by GLSL IR. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-10-26 14:29:36 +11:00
Karol Herbst	0404678c5f	nv50/ir: start LocalCSE with getFirst to merge PHI instructions total instructions in shared programs : 3499888 -> 3499445 (-0.01%) total gprs used in shared programs : 453866 -> 453803 (-0.01%) total local used in shared programs : 21621 -> 21621 (0.00%) total bytes used in shared programs : 32078952 -> 32074936 (-0.01%) local gpr inst bytes helped 0 39 119 119 hurt 0 0 0 0 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-25 20:55:07 +02:00
Samuel Pitoiset	7b2712c367	nvc0: use correct bufctx when invalidating CP textures Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>	2016-10-25 20:22:05 +02:00
Brian Paul	76c3f1bbbe	gallium/stapi: fix comment for st_visual::buffer_mask Trivial.	2016-10-24 17:22:00 -07:00
Brian Paul	88a618ce86	tgsi: trivial build fix for MSVC Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-24 14:16:07 -07:00
Samuel Pitoiset	6dbb8d12a8	nv50/ir: do not perform global membar for shared memory Shared memory is local to CTA, thus we should only wait for prior memory writes which are visible to other threads in the same CTA, and not at global level. This should speedup compute shaders which use shared memory. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-24 22:51:54 +02:00
Axel Davy	eed605a473	st/nine: Fix locking CubeTexture surfaces. Only one face of Cubetextures was locked when in DEFAULT Pool. Fixes: https://github.com/iXit/Mesa-3D/issues/129 CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-24 21:56:44 +02:00
Axel Davy	fe7bb46134	st/nine: Fix mistake in Volume9 UnlockBox In the format fallback path, the height was used instead of the depth. CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-24 21:56:44 +02:00
Axel Davy	942778099e	st/nine: Use align_calloc instead of align_malloc We are not sure exactly what needs to be 0 initialized, but we are missing some cases. 0 initialize all our current aligned allocation. Fixes Tree of Savior visual issues. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-24 21:56:44 +02:00
Axel Davy	54010cf8b6	gallium/util: Add align_calloc Add implementation for align_calloc, which is align_malloc + memset. v2: add if (ptr) before memset. Fix indentation. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:56:44 +02:00
Axel Davy	25beccb379	st/nine: Fix leak with integer and boolean constants Leak introduced by: `a83dce0128` The patch also moves the part to release changed.vs_const_i and changed.vs_const_b before the if (!cb.buffer_size) check, to avoid reuploading every draw call if integer or boolean constants are dirty, but the shaders use no constants. Signed-off-by: Axel Davy <axel.davy@ens.fr> CC: "13.0" <mesa-stable@lists.freedesktop.org>	2016-10-24 21:56:44 +02:00
Marek Olšák	f35b1d156b	tgsi/scan: scan texture offset operands This seems important considering how much we depend on some of the flags. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:38 +02:00
Marek Olšák	a2f98dff14	tgsi/scan: move src operand processing into a separate function the next commit will need this Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:36 +02:00
Marek Olšák	72267a25db	tgsi/scan: get information about shader buffer usage Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:35 +02:00
Marek Olšák	d89890d000	tgsi/scan: handle indirect image indexing correctly Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:33 +02:00
Marek Olšák	ac37720f51	tgsi/scan: don't treat RESQ etc. as memory instructions Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:30 +02:00
Marek Olšák	f095a4eb17	tgsi/scan: get information about indirect 2D file access Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:28 +02:00
Marek Olšák	965a5f1810	tgsi/scan: get information about indirect CONST access Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:26 +02:00
Samuel Pitoiset	d588e4f192	nv50/ir: display OP_BAR subops in debug mode Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-24 18:53:45 +02:00
Ilia Mirkin	7b7eb7170d	nv50/ir: it appears that OP_DISCARD can't take a join modifier nvdisasm does not print a .S even though the bit is set. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-22 12:02:35 -04:00
Ilia Mirkin	adad576bfc	nv50/ir: use levelZero for non-frag tex/txp ops radeonsi also does the same thing. I suspect that this is likely to be a no-op in reality, but it brings nouveau code closer to what the blob produces. Plus it makes sense to not try to do auto-derivatives on this. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-22 12:02:35 -04:00
Ilia Mirkin	3fdeb7c983	gallium: add PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS This allows the driver to signal that it can't handle random interleaving of attributes across buffers. This is required for ARB_transform_feedback3, and it's initialized to whatever the previous value of PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME was except for nv50 where it is disabled. Note that the proprietary drivers never expose ARB_transform_feedback3 on any GT21x's (where nouveau previously did), and after some effort I was unable to get it to work. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-22 12:02:35 -04:00
Samuel Pitoiset	6e08f3e96c	nvc0/ir: remove outdated comment about SHLADD Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-22 14:50:17 +02:00
Eric Anholt	8ff4182876	vc4: Avoid making temporaries for assignments to NIR registers. Getting stores to NIR regs to not generate new MOVs is tricky, since the result we're trying to store into the NIR reg may have been from a conditional update of a temp, or a series of packed writes. The easiest solution seems to be to require that nir_store_dest()'s arg comes from an SSA temp. This causes us to put in a few more temporary MOVs in the NIR SSA dest case, but copy propagation successfully cleans those up. The shader-db change is modest: total instructions in shared programs: 93774 -> 93598 (-0.19%) instructions in affected programs: 14760 -> 14584 (-1.19%) total estimated cycles in shared programs: 212135 -> 211946 (-0.09%) estimated cycles in affected programs: 27005 -> 26816 (-0.70%) but I was seeing patterns in some register-allocation failures in DEQP tests that looked like the extra MOVs would increase maximum register pressure in loops. Some debug code indicates that that's not the case, though I'm still a bit confused by that result.	2016-10-21 14:12:22 -07:00
Eric Anholt	a689b8b9df	vc4: Add a comment with discussion of how simulation works.	2016-10-21 14:12:22 -07:00
Eric Anholt	83ffb607b7	vc4: Move simulator winsys mapping and tracking to the simulator. One tiny hack is left in vc4_bufmgr.c for what kind of mapping we got so that we can free it.	2016-10-21 14:12:22 -07:00
Eric Anholt	1c38ee380d	vc4: Move simulator memory management to a u_mm.h heap. Now we aren't limited to 256MB total allocated across a driver instance, just 256MB at one time. We're still copying in and out, which should get fixed.	2016-10-21 14:12:22 -07:00
Eric Anholt	9f75522382	vc4: Move simulator globals into a struct. I would like to put a couple more things in here, so it's time to package it up.	2016-10-21 14:12:22 -07:00
Eric Anholt	78087676c9	vc4: Restructure the simulator mode. Rather than having simulator mode changes scattered around vc4_bufmgr.c and vc4_screen.c, make vc4_bufmgr.c just call a vc4_simulator_ioctl, which then dispatches to a corresponding implementation. This will give the simulator support a centralized place to do tricks like storing most BOs directly in simulator memory rather than copying in and out. This leaves special casing of mmaping BOs and execution, because of the winsys mapping.	2016-10-21 14:12:22 -07:00
Eric Anholt	1d7874fa7b	vc4: Fix termination of the initial scan for branch targets. The loop is scanning until the original max_ip (size of the BO), but we want to not examine any code after the PROG_END's delay slots. There was a block trying to do that, except that we had some early continue statements if the signal wasn't a PROG_END or a BRANCH. The failure mode would be that a valid shader is rejected because some undefined memory after the PROG_END slots is parsed as a branch and the rest of its setup is illegal. I haven't seen this in the wild, but valgrind was complaining and the new userland simulator code started triggering it.	2016-10-21 14:12:06 -07:00
Nicolai Hähnle	17353ef043	radeonsi: fix a regression in si_eliminate_const_output A constant value of float type is not necessarily a ConstantFP: it could also be a constant expression that for some reason hasn't been folded. This fixes a regression in GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 that was introduced by commit `3ec9975555`. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-21 09:59:26 +02:00
Ilia Mirkin	8cf0f05713	nv50,nvc0: don't keep track of whether fb rt0 is integer-only This reverts commits `1af0641db3` and `a6ad49cbbd`. st/mesa adjusts the rasterizer state for us now. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-21 02:28:26 -04:00
Samuel Pitoiset	42273edf79	nvc0: do not break 3D state by pushing MS coordinates on Fermi Long story short, 3D and CP are aliased on Fermi and initializing compute after pushing the MS sample coordinate offsets seems to corrupt 3D state for weird reasons. I still don't have the faintest clue what is going on, but this seems to only affect Fermi generation. A possible fix could be to use two different channels, one for 3D and one for CP. This fixes a bunch of regressions pinpointed by piglit. Fixes: "nvc0: fix up image support for allowing multiple samples" Cc: "13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-20 19:59:08 +02:00
Samuel Pitoiset	24e15aa198	nvc0: translate compute shaders at program creation This makes shader-db reports results for compute shaders. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-20 19:46:18 +02:00
Marek Olšák	c2a602d21a	gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.h Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>	2016-10-20 17:45:23 +02:00
Marek Olšák	f19f71830b	radeonsi: fix build of si_eliminate_const_vs_outputs on LLVM <= 3.8 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-20 11:07:50 +02:00
Marek Olšák	2db56434d4	gallivm: add wrappers for missing functions in LLVM <= 3.8 radeonsi needs these. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-20 11:07:50 +02:00
Nicolai Hähnle	4a2dbfff05	radeonsi: fix 64-bit loads from LDS Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among others. Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-20 10:37:07 +02:00
Ilia Mirkin	cd45d758ff	nv50/ir: process texture offset sources as regular sources With ARB_gpu_shader5, texture offsets can be any source, including TEMPs and IN's. Make sure to process them as regular sources so that we pick up masks, etc. This should fix some CTS tests that feed offsets directly to textureGatherOffset, and we were not picking up the input use, thus not advertising it in the shader header. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Dave Airlie <airlied@redhat.com> Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>	2016-10-19 21:02:01 -04:00
Ilia Mirkin	313fba5ee1	nv50,nvc0: avoid reading out of bounds when getting bogus so info The state tracker tries to attach the info to the wrong shader. This is easy enough to protect against. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>	2016-10-19 21:02:01 -04:00
Samuel Pitoiset	2b6e04e91f	nvc0/ir: simplify predicate logic for GK104 atomic operations The predicate is always CC_NOT_P as defined in processSurfaceCoordsNVE4(), so we only want to emit OR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-19 23:53:57 +02:00
Samuel Pitoiset	974ab614d3	nvc0/ir: remove useless NVC0LoweringPass::gMemBase Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-19 23:53:48 +02:00
Samuel Pitoiset	03dc87caab	nv50/ir: print CCTL subops in debug mode Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-19 23:53:39 +02:00
Marek Olšák	3ec9975555	radeonsi: eliminate trivial constant VS outputs These constant value VS PARAM exports: - 0,0,0,0 - 0,0,0,1 - 1,1,1,0 - 1,1,1,1 can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports can be removed from the IR to save export & parameter memory. After LLVM optimizations, analyze the IR to see which exports are equal to the ones listed above (or undef) and remove them if they are. Targeted use cases: - All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them are unused by PS (such as Witcher 2 below). - VS output arrays with unused elements that the GLSL compiler can't eliminate (such as Batman below). The shader-db deltas are quite interesting: (not from upstream si-report.py, it won't be upstreamed) PERCENTAGE DELTAS Shaders PARAM exports (affected only) batman_arkham_origins 589 -67.17 % bioshock-infinite 1769 -0.47 % dirt-showdown 548 -2.68 % dota2 1747 -3.36 % f1-2015 776 -4.94 % left_4_dead_2 1762 -0.07 % metro_2033_redux 2670 -0.43 % portal 474 -0.22 % talos_principle 324 -3.63 % warsow 176 -2.20 % witcher2 1040 -73.78 % ---------------------------------------- All affected 991 -65.37 % ... 9681 -> 3353 ---------------------------------------- Total 26725 -10.82 % ... 58490 -> 52162 v2: treat Undef as both 0 and 1 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)	2016-10-19 22:21:46 +02:00
Samuel Pitoiset	041da0ae81	nv50/ir: silent TGSI_PROPERTY_FS_DEPTH_LAYOUT Found that information message while replaying a trace from Metro 2033 Redux. Mark that property as useless for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-19 21:02:50 +02:00
Marek Olšák	a2ea653a49	radeonsi: remove cb0_is_integer handling st/mesa does this for us. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-19 19:26:30 +02:00
Roland Scheidegger	aeceec54a8	draw: improve vertex fetch (v2) The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
Roland Scheidegger	0942fe548e	draw: improved handling of undefined inputs Previous attempts to zero initialize all inputs were not really optimal (though no performance impact was measurable). In fact this is not really necessary, since we know the max number of inputs used. Instead, just generate fetch for up to max inputs used by the shader, directly replacing inputs for which there was no vertex element by zero. This also cleans up key generation, which previously would have stored some garbage for these elements. And also drop the assertion which indicates such bogus usage by a debug_printf (the whole point of initializing the undefined inputs was to make this case safe to handle). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
Roland Scheidegger	d1b4a3451e	gallivm: print out time for jitting functions with GALLIVM_DEBUG=perf Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
Roland Scheidegger	6f2f0daeb4	gallivm: Use native packs and unpacks for the lerps For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine with a single instruction, albeit only with llvm 3.8 - the vpmovzxbw). Ideally we'd use more clever pack for llvmpipe backend conversion too since we actually use the "wrong" shuffle (which is more work) when doing the fs twiddle just so we end up with the wrong order for being able to do native pack when converting from 2x8f -> 1x16b. But this requires some refactoring, since the untwiddle is separate from conversion. This is only used for avx2 256bit pack/unpack for now. Improves openarena scores by 8% or so, though overall it's still pretty disappointing how much faster 256bit vectors are even with avx2 (or rather, aren't...). And, of course, eliminating the needless packs/unpacks in the first place would eliminate most of that advantage (not quite all) from this patch. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
Brian Paul	8b731b8b03	svga: minor code improvements in svga_validate_pipe_sampler_view() Use the 'texture' local var in more places. Rename 'pFormat' to 'viewFormat'. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-10-18 16:16:26 -06:00
Boyuan Zhang	5567145d59	st/va: force to flush the last p frame in idr period During dual instance encoding submission, if the second encode task and first encode task have no reference dependency, e.g. p following with idr-frame, there is a chance the second task will use for its reconstructed picture buffer the same buffer used by first task for its reference/reconstructed picture. In this case, buffer corruption may occur depending on encoding speed. Fix is to force flush these two tasks separately to avoid race condition Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005 Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2016-10-18 15:16:34 -04:00
Marek Olšák	21af69e753	radeonsi: rename prefixes from radeon to si Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:08 +02:00
Marek Olšák	6e475fefa1	radeonsi: merge radeon_llvm_context and si_shader_context Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:06 +02:00

1 2 3 4 5 ...

29279 Commits