mirrors/mesa - Frog Git

Commit Graph

Author	SHA1	Message	Date
Brian Paul	88a618ce86	tgsi: trivial build fix for MSVC Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-24 14:16:07 -07:00
Samuel Pitoiset	6dbb8d12a8	nv50/ir: do not perform global membar for shared memory Shared memory is local to CTA, thus we should only wait for prior memory writes which are visible to other threads in the same CTA, and not at global level. This should speedup compute shaders which use shared memory. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-24 22:51:54 +02:00
Axel Davy	eed605a473	st/nine: Fix locking CubeTexture surfaces. Only one face of Cubetextures was locked when in DEFAULT Pool. Fixes: https://github.com/iXit/Mesa-3D/issues/129 CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-24 21:56:44 +02:00
Axel Davy	fe7bb46134	st/nine: Fix mistake in Volume9 UnlockBox In the format fallback path, the height was used instead of the depth. CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-24 21:56:44 +02:00
Axel Davy	942778099e	st/nine: Use align_calloc instead of align_malloc We are not sure exactly what needs to be 0 initialized, but we are missing some cases. 0 initialize all our current aligned allocation. Fixes Tree of Savior visual issues. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-24 21:56:44 +02:00
Axel Davy	54010cf8b6	gallium/util: Add align_calloc Add implementation for align_calloc, which is align_malloc + memset. v2: add if (ptr) before memset. Fix indentation. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:56:44 +02:00
Axel Davy	25beccb379	st/nine: Fix leak with integer and boolean constants Leak introduced by: `a83dce0128` The patch also moves the part to release changed.vs_const_i and changed.vs_const_b before the if (!cb.buffer_size) check, to avoid reuploading every draw call if integer or boolean constants are dirty, but the shaders use no constants. Signed-off-by: Axel Davy <axel.davy@ens.fr> CC: "13.0" <mesa-stable@lists.freedesktop.org>	2016-10-24 21:56:44 +02:00
Marek Olšák	f35b1d156b	tgsi/scan: scan texture offset operands This seems important considering how much we depend on some of the flags. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:38 +02:00
Marek Olšák	a2f98dff14	tgsi/scan: move src operand processing into a separate function the next commit will need this Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:36 +02:00
Marek Olšák	72267a25db	tgsi/scan: get information about shader buffer usage Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:35 +02:00
Marek Olšák	d89890d000	tgsi/scan: handle indirect image indexing correctly Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:33 +02:00
Marek Olšák	ac37720f51	tgsi/scan: don't treat RESQ etc. as memory instructions Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:30 +02:00
Marek Olšák	f095a4eb17	tgsi/scan: get information about indirect 2D file access Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:28 +02:00
Marek Olšák	965a5f1810	tgsi/scan: get information about indirect CONST access Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-24 21:41:26 +02:00
Samuel Pitoiset	d588e4f192	nv50/ir: display OP_BAR subops in debug mode Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-24 18:53:45 +02:00
Ilia Mirkin	7b7eb7170d	nv50/ir: it appears that OP_DISCARD can't take a join modifier nvdisasm does not print a .S even though the bit is set. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-22 12:02:35 -04:00
Ilia Mirkin	adad576bfc	nv50/ir: use levelZero for non-frag tex/txp ops radeonsi also does the same thing. I suspect that this is likely to be a no-op in reality, but it brings nouveau code closer to what the blob produces. Plus it makes sense to not try to do auto-derivatives on this. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-22 12:02:35 -04:00
Ilia Mirkin	3fdeb7c983	gallium: add PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS This allows the driver to signal that it can't handle random interleaving of attributes across buffers. This is required for ARB_transform_feedback3, and it's initialized to whatever the previous value of PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME was except for nv50 where it is disabled. Note that the proprietary drivers never expose ARB_transform_feedback3 on any GT21x's (where nouveau previously did), and after some effort I was unable to get it to work. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-22 12:02:35 -04:00
Samuel Pitoiset	6e08f3e96c	nvc0/ir: remove outdated comment about SHLADD Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-22 14:50:17 +02:00
Eric Anholt	8ff4182876	vc4: Avoid making temporaries for assignments to NIR registers. Getting stores to NIR regs to not generate new MOVs is tricky, since the result we're trying to store into the NIR reg may have been from a conditional update of a temp, or a series of packed writes. The easiest solution seems to be to require that nir_store_dest()'s arg comes from an SSA temp. This causes us to put in a few more temporary MOVs in the NIR SSA dest case, but copy propagation successfully cleans those up. The shader-db change is modest: total instructions in shared programs: 93774 -> 93598 (-0.19%) instructions in affected programs: 14760 -> 14584 (-1.19%) total estimated cycles in shared programs: 212135 -> 211946 (-0.09%) estimated cycles in affected programs: 27005 -> 26816 (-0.70%) but I was seeing patterns in some register-allocation failures in DEQP tests that looked like the extra MOVs would increase maximum register pressure in loops. Some debug code indicates that that's not the case, though I'm still a bit confused by that result.	2016-10-21 14:12:22 -07:00
Eric Anholt	a689b8b9df	vc4: Add a comment with discussion of how simulation works.	2016-10-21 14:12:22 -07:00
Eric Anholt	83ffb607b7	vc4: Move simulator winsys mapping and tracking to the simulator. One tiny hack is left in vc4_bufmgr.c for what kind of mapping we got so that we can free it.	2016-10-21 14:12:22 -07:00
Eric Anholt	1c38ee380d	vc4: Move simulator memory management to a u_mm.h heap. Now we aren't limited to 256MB total allocated across a driver instance, just 256MB at one time. We're still copying in and out, which should get fixed.	2016-10-21 14:12:22 -07:00
Eric Anholt	9f75522382	vc4: Move simulator globals into a struct. I would like to put a couple more things in here, so it's time to package it up.	2016-10-21 14:12:22 -07:00
Eric Anholt	78087676c9	vc4: Restructure the simulator mode. Rather than having simulator mode changes scattered around vc4_bufmgr.c and vc4_screen.c, make vc4_bufmgr.c just call a vc4_simulator_ioctl, which then dispatches to a corresponding implementation. This will give the simulator support a centralized place to do tricks like storing most BOs directly in simulator memory rather than copying in and out. This leaves special casing of mmaping BOs and execution, because of the winsys mapping.	2016-10-21 14:12:22 -07:00
Eric Anholt	1d7874fa7b	vc4: Fix termination of the initial scan for branch targets. The loop is scanning until the original max_ip (size of the BO), but we want to not examine any code after the PROG_END's delay slots. There was a block trying to do that, except that we had some early continue statements if the signal wasn't a PROG_END or a BRANCH. The failure mode would be that a valid shader is rejected because some undefined memory after the PROG_END slots is parsed as a branch and the rest of its setup is illegal. I haven't seen this in the wild, but valgrind was complaining and the new userland simulator code started triggering it.	2016-10-21 14:12:06 -07:00
Nicolai Hähnle	17353ef043	radeonsi: fix a regression in si_eliminate_const_output A constant value of float type is not necessarily a ConstantFP: it could also be a constant expression that for some reason hasn't been folded. This fixes a regression in GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 that was introduced by commit `3ec9975555`. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-21 09:59:26 +02:00
Ilia Mirkin	8cf0f05713	nv50,nvc0: don't keep track of whether fb rt0 is integer-only This reverts commits `1af0641db3` and `a6ad49cbbd`. st/mesa adjusts the rasterizer state for us now. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-21 02:28:26 -04:00
Samuel Pitoiset	42273edf79	nvc0: do not break 3D state by pushing MS coordinates on Fermi Long story short, 3D and CP are aliased on Fermi and initializing compute after pushing the MS sample coordinate offsets seems to corrupt 3D state for weird reasons. I still don't have the faintest clue what is going on, but this seems to only affect Fermi generation. A possible fix could be to use two different channels, one for 3D and one for CP. This fixes a bunch of regressions pinpointed by piglit. Fixes: "nvc0: fix up image support for allowing multiple samples" Cc: "13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-20 19:59:08 +02:00
Samuel Pitoiset	24e15aa198	nvc0: translate compute shaders at program creation This makes shader-db reports results for compute shaders. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-20 19:46:18 +02:00
Marek Olšák	c2a602d21a	gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.h Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>	2016-10-20 17:45:23 +02:00
Marek Olšák	f19f71830b	radeonsi: fix build of si_eliminate_const_vs_outputs on LLVM <= 3.8 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-20 11:07:50 +02:00
Marek Olšák	2db56434d4	gallivm: add wrappers for missing functions in LLVM <= 3.8 radeonsi needs these. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-20 11:07:50 +02:00
Nicolai Hähnle	4a2dbfff05	radeonsi: fix 64-bit loads from LDS Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among others. Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-20 10:37:07 +02:00
Ilia Mirkin	cd45d758ff	nv50/ir: process texture offset sources as regular sources With ARB_gpu_shader5, texture offsets can be any source, including TEMPs and IN's. Make sure to process them as regular sources so that we pick up masks, etc. This should fix some CTS tests that feed offsets directly to textureGatherOffset, and we were not picking up the input use, thus not advertising it in the shader header. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Dave Airlie <airlied@redhat.com> Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>	2016-10-19 21:02:01 -04:00
Ilia Mirkin	313fba5ee1	nv50,nvc0: avoid reading out of bounds when getting bogus so info The state tracker tries to attach the info to the wrong shader. This is easy enough to protect against. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>	2016-10-19 21:02:01 -04:00
Samuel Pitoiset	2b6e04e91f	nvc0/ir: simplify predicate logic for GK104 atomic operations The predicate is always CC_NOT_P as defined in processSurfaceCoordsNVE4(), so we only want to emit OR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-19 23:53:57 +02:00
Samuel Pitoiset	974ab614d3	nvc0/ir: remove useless NVC0LoweringPass::gMemBase Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-19 23:53:48 +02:00
Samuel Pitoiset	03dc87caab	nv50/ir: print CCTL subops in debug mode Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-19 23:53:39 +02:00
Marek Olšák	3ec9975555	radeonsi: eliminate trivial constant VS outputs These constant value VS PARAM exports: - 0,0,0,0 - 0,0,0,1 - 1,1,1,0 - 1,1,1,1 can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports can be removed from the IR to save export & parameter memory. After LLVM optimizations, analyze the IR to see which exports are equal to the ones listed above (or undef) and remove them if they are. Targeted use cases: - All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them are unused by PS (such as Witcher 2 below). - VS output arrays with unused elements that the GLSL compiler can't eliminate (such as Batman below). The shader-db deltas are quite interesting: (not from upstream si-report.py, it won't be upstreamed) PERCENTAGE DELTAS Shaders PARAM exports (affected only) batman_arkham_origins 589 -67.17 % bioshock-infinite 1769 -0.47 % dirt-showdown 548 -2.68 % dota2 1747 -3.36 % f1-2015 776 -4.94 % left_4_dead_2 1762 -0.07 % metro_2033_redux 2670 -0.43 % portal 474 -0.22 % talos_principle 324 -3.63 % warsow 176 -2.20 % witcher2 1040 -73.78 % ---------------------------------------- All affected 991 -65.37 % ... 9681 -> 3353 ---------------------------------------- Total 26725 -10.82 % ... 58490 -> 52162 v2: treat Undef as both 0 and 1 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)	2016-10-19 22:21:46 +02:00
Samuel Pitoiset	041da0ae81	nv50/ir: silent TGSI_PROPERTY_FS_DEPTH_LAYOUT Found that information message while replaying a trace from Metro 2033 Redux. Mark that property as useless for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-19 21:02:50 +02:00
Marek Olšák	a2ea653a49	radeonsi: remove cb0_is_integer handling st/mesa does this for us. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-19 19:26:30 +02:00
Roland Scheidegger	aeceec54a8	draw: improve vertex fetch (v2) The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
Roland Scheidegger	0942fe548e	draw: improved handling of undefined inputs Previous attempts to zero initialize all inputs were not really optimal (though no performance impact was measurable). In fact this is not really necessary, since we know the max number of inputs used. Instead, just generate fetch for up to max inputs used by the shader, directly replacing inputs for which there was no vertex element by zero. This also cleans up key generation, which previously would have stored some garbage for these elements. And also drop the assertion which indicates such bogus usage by a debug_printf (the whole point of initializing the undefined inputs was to make this case safe to handle). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
Roland Scheidegger	d1b4a3451e	gallivm: print out time for jitting functions with GALLIVM_DEBUG=perf Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
Roland Scheidegger	6f2f0daeb4	gallivm: Use native packs and unpacks for the lerps For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine with a single instruction, albeit only with llvm 3.8 - the vpmovzxbw). Ideally we'd use more clever pack for llvmpipe backend conversion too since we actually use the "wrong" shuffle (which is more work) when doing the fs twiddle just so we end up with the wrong order for being able to do native pack when converting from 2x8f -> 1x16b. But this requires some refactoring, since the untwiddle is separate from conversion. This is only used for avx2 256bit pack/unpack for now. Improves openarena scores by 8% or so, though overall it's still pretty disappointing how much faster 256bit vectors are even with avx2 (or rather, aren't...). And, of course, eliminating the needless packs/unpacks in the first place would eliminate most of that advantage (not quite all) from this patch. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
Brian Paul	8b731b8b03	svga: minor code improvements in svga_validate_pipe_sampler_view() Use the 'texture' local var in more places. Rename 'pFormat' to 'viewFormat'. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-10-18 16:16:26 -06:00
Boyuan Zhang	5567145d59	st/va: force to flush the last p frame in idr period During dual instance encoding submission, if the second encode task and first encode task have no reference dependency, e.g. p following with idr-frame, there is a chance the second task will use for its reconstructed picture buffer the same buffer used by first task for its reference/reconstructed picture. In this case, buffer corruption may occur depending on encoding speed. Fix is to force flush these two tasks separately to avoid race condition Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005 Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2016-10-18 15:16:34 -04:00
Marek Olšák	21af69e753	radeonsi: rename prefixes from radeon to si Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:08 +02:00
Marek Olšák	6e475fefa1	radeonsi: merge radeon_llvm_context and si_shader_context Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:06 +02:00
Marek Olšák	5ab25bb4ba	radeonsi: import all TGSI->LLVM code from gallium/radeon Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:04 +02:00
Marek Olšák	4967cacdfa	gallium/radeon: simplify initialization of 64-bit gallivm builders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:03 +02:00
Marek Olšák	502dad4dca	gallium/radeon: remove unused radeon_llvm_reg_index_soa Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:41:01 +02:00
Marek Olšák	4e5d076fcf	radeonsi: move LLVM ALU codegen into radeonsi Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-18 18:40:59 +02:00
Emil Velikov	af7abc512c	loader: remove loader_get_driver_for_fd() driver_type Reminiscent from the pre-loader days, were we had multiple instances of the loader logic in separate places and one could build a "GALLIUM_ONLY" version. Since that is no longer the case and the loaders (glx/egl/gbm) do not (and should not) require to know any classic/gallium specific we can drop the argument and the related code. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-18 17:06:29 +01:00
Ilia Mirkin	8c78fdb328	gm107/ir: fix bit offset of tex lod setting for indirect texturing Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-10-18 09:56:14 -04:00
Ilia Mirkin	ecea2f69ef	gm107/ir: fix texturing with indirect samplers The indirect handle has to come right after the coordinates, so if there was a sample/bias/depth compare/offset, everything would end up being shifted by one argument position. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-10-18 09:56:14 -04:00
Marek Olšák	34099894c3	gallium/tgsi: add missing #include Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-18 11:20:57 +02:00
Julien Isorce	dbc8e18116	st/va: set default rt formats when calling vaCreateConfig As specified in va.h, default value should be set on attributes not present in the input list. Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2016-10-18 08:44:14 +01:00
Nicolai Hähnle	9160b4d981	radeonsi: unify the constant load paths Remove the split between direct and indirect. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-17 19:08:45 +02:00
Nicolai Hähnle	51f9b38ce8	radeonsi: fix indirect loads of 64 bit constants This fixes GL45-CTS.compute_shader.fp64-case3. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-17 19:08:36 +02:00
Marek Olšák	74d145f4a8	radeonsi: shorten "shader->selector" to "sel" in si_shader_create Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-17 12:13:00 +02:00
Marek Olšák	2e74e8ead9	radeonsi: clear DB_RENDER_OVERRIDE Vulkan doesn't set these fields even though it doesn't use HiS. HiS is disabled by programming DB_SRESULTS_COMPARE_STATEn to 0. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-17 12:13:00 +02:00
Axel Davy	9baf4505fb	st/nine: Fix multisample limit check Fixes regression introduced by `b560305687` The regression prevents some apps to start. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-17 00:02:52 +02:00
Eric Anholt	c61eb3c91c	vc4: Fix fast clear color packing for 565. Piglit didn't manage to cover this because fbo-clear-formats uses scissors, so we don't get fast clearing.	2016-10-16 11:22:50 -07:00
Tobias Klausmann	b7d9677de8	nv50/ir: constant fold OP_SPLIT Split the source immediate value into new values and move them into the original defs set by the split. Since we can only have up to 64-bit immediates, this is largely beneficial for F64 (and, in the future, U64) operations. Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> [imirkin: always use U32, set newi for foldCount tracking] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-14 23:23:57 -04:00
Jose Fonseca	c6d17701c8	pipe_loader_sw: Don't invoke Unix close() on Windows. Trivial.	2016-10-14 16:29:04 +01:00
Emil Velikov	48267b730c	gallium: annotate sw_driver_descriptor instance as const data Already treated and handled as such. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-14 11:09:00 +01:00
Emil Velikov	792148f16a	gallium: annotate drm_driver_descriptor instance as const data Already treated and handled as such. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-14 11:09:00 +01:00
Emil Velikov	c079a206ad	gallium: rename drm_driver_descriptor::{, driver_}name Historically we use "device name" for the name of the kernel module and "driver name" for the dri/other driver. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-14 11:09:00 +01:00
Emil Velikov	9837cf13b1	gallium: remove unused drm_driver_descriptor::driver_name Likely unused since day 1, although I've only checked back until the st/dri unification with commit `29ca7d2c94` ("st/dri: merge dri/drm and dri/sw backends") Based on the comment, referencing drmOpenByName it's not something we want to bring back. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-14 11:09:00 +01:00
Emil Velikov	0f031dcf11	gallium: fix drm_driver_descriptor::name comment Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-14 11:09:00 +01:00
Mark Thompson	0b241b7717	st/va: Fix H.264 PicOrderCnt value TopFieldPicOrderCnt is exactly the PicOrderCnt value for a frame - see H.264 section 8.2.1. Reviewed-by: Christian König <christian.koenig@amd.com>	2016-10-14 11:57:52 +02:00
Mark Thompson	1edaa33135	st/va: Baseline profile is not supported Constrained baseline profile is supported, so use that instead. This matches what the encoder already does (constraint_set1_flag is always set in the output bitstream). Reviewed-by: Christian König <christian.koenig@amd.com>	2016-10-14 11:57:48 +02:00
Mark Thompson	e0604eed9f	st/va: Return surface formats depending on config chroma format This makes the supported format actually match the configuration, and allows the user to observe that NV12 is supported for video processing where previously they couldn't (though it did always work if they blindly tried to use it anyway). Reviewed-by: Christian König <christian.koenig@amd.com>	2016-10-14 11:57:44 +02:00
Mark Thompson	e7c7ef3625	st/va: Save surface chroma format in config Both YUV420 and RGB32 configurations are supported, so we need to be able to distinguish which is being used. Reviewed-by: Christian König <christian.koenig@amd.com>	2016-10-14 11:57:40 +02:00
Mark Thompson	8a931c83ba	st/va: Return more useful config attributes The encoder attributes are needed for a user of the encoder to be able to configure it sensibly without internal knowledge. Reviewed-by: Christian König <christian.koenig@amd.com>	2016-10-14 11:57:25 +02:00
Tim Rowley	a42c22fdbf	swr: [rasterizer core] don't construct pArContext on non-ar builds Stops debug directory being created on non-ar builds. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-13 23:39:14 -05:00
Tim Rowley	29d07480b8	swr: [rasterizer core] remove WorkerWaitForThreadEvent bucket Cause of bucket stop capture hang, as threads get stuck in level 1. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-13 23:39:14 -05:00
Tim Rowley	ada27b503e	swr: [rasterizer core] move binner functionality to separate file Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-13 23:39:14 -05:00
Tim Rowley	f0a66c1da2	swr: [rasterizer scripts] add DEBUG_OUTPUT_DIR knob Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-13 23:39:14 -05:00
Tim Rowley	ffd0224303	swr: [rasterizer core] fix comment typo Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-13 23:39:14 -05:00
Tim Rowley	4889922210	swr: [rasterizer core/sim] 8x2 backend + 16-wide tile clear/load/store Work in progress (disabled). USE_8x2_TILE_BACKEND define in knobs.h enables AVX512 code paths (emulated on non-AVX512 HW). Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-13 23:39:14 -05:00
Tim Rowley	bf1f46216c	swr: [rasterizer archrast] fix event file issue with saving data Also, tagging stats with draw id to correlate these events with draw/dispatch events. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-13 23:39:13 -05:00
Eric Engestrom	827e038062	swr: [rasterizer common] fix assert index Fixes: `b3bd8bb611` ("swr: [rasterizer core] add support for "RAW" surface format") CovID: 1373647 Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-13 21:37:20 -05:00
Ilia Mirkin	afb6dc53bf	nv50: enable ARB_enhanced_layouts Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-13 21:45:21 -04:00
Ilia Mirkin	a6d6eff2e6	nvc0/ir: be more careful about preserving modifiers in SHLADD creation src2 was being given the wrong modifier, and we were not properly managing the modifier on the SHL source either. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-13 21:44:03 -04:00
Brian Paul	b81546d43c	tgsi: fix comment typo in tgsi_ureg.c Trivial.	2016-10-13 17:38:49 -06:00
Eric Anholt	99d790538d	vc4: Avoid loading from the texture during non-utile-aligned glTexImage(). Previously, the plan was "if the width/height we have to load/store isn't the size the user is planning on writing, then we need to load the old contents out beforehand to prevent writing back undefined". However, when we're doing glTexImage() we often end up aligning the width/height into the padding of the texture, and we don't actually need to read out that padding. Improves x11perf -aatrapezoid100 performance from ~460/sec to ~700/sec.	2016-10-13 14:27:30 -07:00
Axel Davy	0717cd975d	st/nine: Fix possible segfault in surface ctor Regression introduced by `ba0274c7d6` Check the resource exists before assigning it a flag (and use This->base.resource instead of pResource, since the former may have a newly allocate resource, while the latter would be NULL). This should reintroduce the behaviour of previous code. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-13 21:16:35 +02:00
Axel Davy	98b8ad61c6	st/nine: Remove useless code in nine_shader Since `1604efa6fd`, lconsti and lconstb don't need to be initialized. Remove some leftovers from the previous code (which has now invalid use of ARRAY_SIZE on a pointer instead of an array). Reported by Coverity. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-13 21:16:35 +02:00
Axel Davy	197cdd1bbd	gallium/os: Use unsigned integers for size computation Use uint64_t instead of int64_t in the calculation, as the result is uint64_t. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 21:16:35 +02:00
Samuel Pitoiset	4527222169	nvc0: enable ARB_enhanced_layouts All ARB_enhanced_layouts piglit tests pass without any changes in our compiler. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-13 21:13:34 +02:00
Marek Olšák	7dddf0b7ab	radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settings The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 19:00:51 +02:00
Marek Olšák	e12c1cab5d	radeonsi: disable ReZ This is a serious performance fix. Discovered by luck. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354 Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 19:00:51 +02:00
Marek Olšák	d4d9ec55c5	radeonsi: implement TC-compatible HTILE so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 19:00:51 +02:00
Marek Olšák	a077185ea9	gallium: add PIPE_RESOURCE_FLAG_TEXTURING_MORE_LIKELY For performance tuning in drivers. It filters out window system framebuffers and OpenGL renderbuffers. radeonsi will use this to guess whether a depth buffer will be read by a shader. There is no guarantee about what will actually happen. This is a departure from PIPE_BIND flags which are defined to be strict but they are useless in practice. Acked-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-13 19:00:51 +02:00
Nicolai Hähnle	761388a0eb	radeonsi: fix regression in image atomics Caused by a bad rebase when pushing commit `76a940893`.	2016-10-13 16:04:16 +02:00
Nicolai Hähnle	76a940893d	radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.* Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic* Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-13 10:17:42 +02:00
Emil Velikov	a4622305e6	swr: automake: add ar_eventhandlerfile_h.template to the tarball Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-10-12 18:55:22 +01:00
Ilia Mirkin	a48a343c29	nvc0/ir: fix textureGather with a single offset Recent fix for non-const offsets broke the case of a single offset (vs 4 offsets). The later code relies on the offs array to contain null values to tell whether they should be added onto the srcs list. Fixes: `5239bd592` ("nvc0/ir: fix overwriting of value backing non-constant gather offset") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-10-12 13:18:14 -04:00
Ilia Mirkin	300b5ad023	nv50/ir: copy over value's register id when resolving merge of a phi The offset needs to be properly copied over to the phi value, otherwise it will get assigned to the base of the merge instead of the proper location. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-10-12 13:18:14 -04:00
Nicolai Hähnle	789119d212	st/mesa: enable ARB_enhanced_layouts and turn the cap on v2: mark llvmpipe & softpipe properly as well (Jason Wood) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-10-12 18:50:10 +02:00
Nicolai Hähnle	2b460c750a	tgsi/ureg: add ureg_DECL_output_layout For specifying an exact location/component. v2: change the order of parameters (Dave) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1) Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)	2016-10-12 18:50:10 +02:00
Nicolai Hähnle	047a7c7a0b	tgsi/ureg: add layout/component input declarations v2: change the order of parameters (Dave) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1) Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)	2016-10-12 18:50:10 +02:00
Nicolai Hähnle	f9a01f3872	tgsi/scan: fix num_inputs/num_outputs for shaders with overlapping arrays v2: remove a tautological left-over assert (Marek) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1) Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)	2016-10-12 18:50:10 +02:00
Nicolai Hähnle	700a571f89	gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTS This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-10-12 18:50:10 +02:00
Tom Stellard	b33cb709fd	radeonsi: Use the new image load/store intrinsic signatures This patch requires LLVM r284024 or newer. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 16:42:43 +00:00
Tom Stellard	ff0df66e10	radeonsi: Add function for converting LLVM type to intrinsic string The existing function only worked for integer types. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 16:42:07 +00:00
Tom Stellard	a96a7eae04	radeonsi: Refactor image store/load intrinsic name creation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 16:42:07 +00:00
Marek Olšák	d7e74b52bb	winsys/amdgpu: fix infinite loop w/ RADEON_NOOP=1 caused by unsubmitted fences Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 18:29:40 +02:00
Marek Olšák	e4bbab9022	radeonsi: fix R600_DEBUG=precompile for shader-db radeonsi no longer supports pixel shaders without interpolation optimizations, which led to assertion failures in si_shader_ps when running shader-db. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 18:29:40 +02:00
Marek Olšák	40e1f7e09b	radeonsi: use TC write-back instead of full cache invalidation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 18:29:40 +02:00
Marek Olšák	8cdce30cc2	radeonsi: implement TC L2 write-back (flush) without cache invalidation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 18:29:40 +02:00
Marek Olšák	65a4d55a9f	radeonsi: don't invalidate VMEM L1 for memory barriers for index buffers Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-12 18:29:40 +02:00
Samuel Pitoiset	87b06cab14	nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c) total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-12 17:46:03 +02:00
Roland Scheidegger	7e86b2ddae	draw: initialize shader inputs This should make the code more robust if a shader tries to use inputs which aren't defined by the vertex element layout (which usually shouldn't happen). No piglit change. Reviewed-by: Brian Paul <brianp@vmware.com>	2016-10-12 15:05:44 +02:00
Ilia Mirkin	389d6dedbe	trace: add invalidate_resource callback Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-11 20:47:54 -04:00
Marek Olšák	b425b57d1e	radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows it Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-11 20:04:57 +02:00
Tim Rowley	9db9c61d26	swr: [rasterizer archrast] update proto file Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:48:23 -05:00
Tim Rowley	3805e40f32	swr: [rasterizer archrast] add support for stats files Only stat and counter events are saved to the event files. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:48:23 -05:00
Tim Rowley	f4684cdb5f	swr: [rasterizer jitter] remove architecture override Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:48:23 -05:00
Tim Rowley	185a531206	swr: [rasterizer jitter] adjust jitmanager assert Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:48:17 -05:00
Tim Rowley	eaec263427	swr: [rasterizer] eliminate unused label warnings on gcc Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	12e6f4c879	swr: [rasterizer core] implement depth bounds test Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	1b86c050ad	swr: [rasterizer core] update/add formats Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	a907b7a5f7	swr: [rasterizer core] SwrStoreTiles api change SwrStoreTiles now takes a mask of surfaces to store. Reduces overhead when storing multiple render targets. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	5d5179a6c2	swr: [rasterizer scripts] add ENABLE_ASSERT_DIALOGS knob for windows Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	07326d4006	swr: [rasterizer archrast] add mako template Add template for generating code to save events to a file. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	e845eeb0be	swr: [rasterizer core] disable cull for rect_list Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	b3bd8bb611	swr: [rasterizer core] add support for "RAW" surface format Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	2966d9c691	swr: [rasterizer core] align Macrotile FIFO memory to SIMD size Align and use streaming store instructions for BE fifo queues. Provides slightly faster enqueue and doesn't pollute the caches. Add appropriate memory fences to ensure streaming writes are globally visible. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	6b3691c876	swr: [rasterizer common] remove threadviz code Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Tim Rowley	2550b04179	swr: [rasterizer memory] split load/store for compile speed Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-10-11 11:22:04 -05:00
Nicholas Bishop	64435fd888	i915g: fix incorrect gl_FragCoord value On Intel Pineview M hardware, the i915 gallium driver doesn't output the correct gl_FragCoord. It seems to always have an X coord of 0.0 and a Y coord of the window's height in pixels, e.g. 600.0f or such. I believe this is a regression caused in part by this commit: `afa035031f` The old behavior used the output at index zero, while the new behavior uses actual zeroes. In the case of gl_FragCoord the output at index zero happened to be the correct one, so the behavior appeared correct although the code already had a bug. Fixed by checking for I915_SEMANTIC_POS when setting up texCoords. If the generic_mapping is I915_SEMANTIC_POS, look for the TGSI_SEMANTIC_POSITION instead of a TGSI_SEMANTIC_GENERIC output. https://bugs.freedesktop.org/show_bug.cgi?id=97477 Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Tested-by: Stéphane Marchesin <marcheu@chromium.org>	2016-10-10 18:32:36 -07:00
Axel Davy	eef0744d43	st/nine: More checks for GetRenderTargetData Fixes a wine test crash Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	a52e700169	st/nine: Add debug output for lost devices Add debug output to ease debugging. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	5d85253dc3	st/nine: Prevent crash in GetRenderTargetData Return error instead of crashing on source surfaces with format D3DFMT_NULL. Fix for issue #236. Tested on Windows 7. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	09edc0555f	st/nine: Set CLAMP_TO_EDGE on cubetextures Wine tests show that cubetextures always use PIPE_TEX_WRAP_CLAMP_TO_EDGE regardless of set sampler states. Fixes failing d3d9 wine test test_cube_wrap. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	fa2574497b	st/nine: handle possible failure of D3DWindowBuffer_create Check for errors and pass them to the callers. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	f04fa0a62c	st/nine: Assert on buffer creation failure Add an assert to make sure buffer creation doesn't fail. Add error handling in calling functions. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	f8c01e7a96	st/nine: Use NineDevice9_CreateDepthStencilSurface in swapchain9 Replace custom code with NineDevice9_CreateDepthStencilSurface. All functionality is given now.	2016-10-10 23:43:51 +02:00
Axel Davy	63367e6c95	st/nine: Fix check and remove useless code in swapchain9 The removed code was there for two reasons: 1) Allow DF16, DF24, INTZ to be used as depth buffer for swapchain, if the driver doesn't support PIPE_BIND_SAMPLER_VIEW for the underlying format 2) Set PIPE_BIND_SAMPLER_VIEW if possible, such that if StretchRect is called on the depth texture, it is happy. 1) The reason these formats needed a workaround is because the check flags for them in CheckDeviceFormat were incorrect, which led applications to think the formats were valid for swapchains, even if they weren't supported. 2) StretchRect limitations for depth buffers force the resource_copy_region path, which should be fine without PIPE_BIND_SAMPLER_VIEW. Thus fix the check for 1), and remove the code. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	60624be203	st/nine: Implement MSAA quality levels Advertise quality levels: Each supported multisample count matches to one quality level. The application doesn't know how much samples each quality level has. For that reason it's not possible to set the multisample mask. Return errors on quality level missmatch. Fixes several old games not having multisample support until now. Fix for issue #73. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	8a50b1244f	st/nine: Prepare update_framebuffer for MS quality levels Compare resource's nr_samples instead of D3D multisample level. Required for multisample quality levels to work correct. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	b560305687	st/nine: Add additional error handling in CheckDeviceMultiSampleType Return one supported quality level in error cases. Return error on invalid multisample count. Fixes failing wine tests. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	7afab4ad39	st/nine: Fix compiler warning Use strict aliasing in SetPrivateData and struct pheader. Casting char[1] to IUnknown** isn't allowed in strict aliasing. Compute pointer to body by adding size of header to header pointer. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	b9f31111ac	st/nine: Remove resource9 {Set/Get/Free}PrivateData functions Remove {Set/Get/Free}PrivateData in resource9. Functionality has been implement in IUnknown interface. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	03888e8a46	st/nine: Remove volume9 {Set/Get/Free}PrivateData functions Remove {Set/Get/Free}PrivateData in volume9. Functionality has been implement in IUnknown interface. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	485cba7eb4	st/nine: Switch {Set/Get/Free}PrivateData functions Switch {Set/Get/Free}PrivateData function to introduced IUnknown functions. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	4117f5e1ab	st/nine: Implement {Set/Get/Free}PrivateData in iunknown Implement {Set/Get/Free}PrivateData in iunknown to get rid of duplicated code in resource9 and volume9. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	c1c8e852c1	st/nine: Return device in NineSurface9_GetContainer According to MSDN the device is returned for surfaces that do not have a regular container. Such surfaces are: OffscreenPlainSurface, DepthStencilSurface and RenderTarget Tested and verified on Windows. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	ba0274c7d6	st/nine: Allocate surface resources in surface ctor Allocate resources in surface ctor. Allows to use statetracker internal memory accounting. Fix for issue #231. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Axel Davy	1f65f67b21	st/nine: Fix D3DFMT_NULL size D3DFMT_NULL is mapped to PIPE_FORMAT_NONE. Instead of relying on PIPE_FORMAT_NONE to return a size, pick one. The one picked is the same than Wine. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	9dc792b95b	st/nine: Add debugging output Add DBG calls to NineTexture9_GetLevelDesc and NineTexture9_GetSurfaceLevel to ease debugging. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	8ceb2264c5	st/nine: Fix assert in NineUnknown_QueryInterface Tests showed that is allowed to call this method on object that have a zero refcount. Required for issue #230. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:51 +02:00
Patrick Rudolph	f2eacef33d	st/nine: Print interface id in NineVolume9_GetContainer To ease debugging print interface id. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Patrick Rudolph	489dbc51ae	st/nine: Print interface id in NineSurface9_GetContainer To ease debugging print interface id. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Patrick Rudolph	e63a38832b	st/nine: Print interface id in NineUnknown_QueryInterface To ease debugging print interface id. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Patrick Rudolph	6a1cce20b6	st/nine: Move assert in NineSurface9_ctor Move assert to function entry. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	851e4b8d8a	st/nine: Properly declare sampler states for ff Fixes a softpipe assertion failure with wine tests Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	5ce23c1689	st/nine: Handle user clipping planes properly for ff Found reading msdn and checking Wine. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	d2fd296648	st/nine: Fix the calculation of the number of vs inputs Fixes hangs on radeonsi, and assert on llvmpipe. Signed-off-by: Axel Davy <axel.davy@ens.fr> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-10-10 23:43:50 +02:00
Axel Davy	71e7292a85	st/nine: Fix specular w coordinate Found looking at Wine formulas. Fixes a few visual issues. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	732cea09cd	st/nine: Disable parts of lighting calculation if no normal provided Behaviour found in Wine sources, and checked with some test apps. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	fc9bb19dce	st/nine: Fix condition for specular lightning Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	c56c7c1fc8	st/nine: Do always accumulate diffuse According to spec. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	c5bce80f50	st/nine: Initialize ps ff registers Found with wine tests for the rTmp register. Not sure for the other ones. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	4ed3d5ee57	st/nine: Do not pollute rTmp in ff ps Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	d9b8b3196e	st/nine: Allocate temporaries on demand for ps ff Same change than for vs ff. This makes it easier to not introduce mistakes reusing temporaries whose result shouldn't be erased. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	f7dd27aed3	st/nine: Fix texbem Error found with wine tests. nine_shader was expecting another order than the one device9 was using. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	7afcbb49ba	st/nine: Fix ff computation for inverse Thanks to wine tests. Apparently 4x4 inverse is to be used, and if the inverse can't be calculated, the input matrix is to be used. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	36399f9a7f	st/nine: Used normed Vtx for reflectionvector Fix deduced from the spec. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	eda1e6ece7	st/nine: Implement SPHEREMAP Behaviour checked with a test app. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	a3ddc80ec8	st/nine: Enable passthrough only if positiont is used Wine tests for the passthrough feature are for positiont. Nothing seems to indicate passthrough happens when positiont it not used. However having passthrough with positiont makes sense (to be used with ProcessVertices outputs). Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	0b5bed774b	st/nine: Fix wrong mask in ff vs Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	028dab95f6	st/nine: Fix tweening factor computation The computation was reversed. Deduced by tests on windows. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	1fe055338d	st/nine: Disable ff vertex blending if required inputs are missing This behaviour has been partially tested on windows. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	aa69bb6848	st/nine: Use materials if source is not given. Deduced by test on windows. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	ab068a78d3	st/nine: Fix ff SPECULARENABLE We were (wrongly) adding specular to diffuse in vertex shaders when SPECULARENABLE was set. However the spec says specular has to be added after texture processing (which is in ps). Besides SPECULARENABLE is flagged as a pixel state. There was unused support for SPECULARENABLE in the ps ff code. Remove the vs code, and use the ps code. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	1d7890a441	st/nine: Undefined specular should be full of zeros Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	d9330f9348	st/nine: Implement normal transformation with vertex blending The formula is different from the one of the spec, but otherwise nothing particular. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	305e8106ab	st/nine: Increase MaxVertexBlendMatrixIndex Modern cards do advertise 8. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	567be40de9	st/nine: Compact ff vs constants a bit There are several holes. This patch reduces the holes a bit, which reduces the size of the constant buffer uploaded. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	07d1f32e0f	st/nine: Fix vertex blending aVtx computation There was an multiplication by the world matrix 0 which had nothing to do there. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	d9d8cb9f19	st/nine: Reorganize ff vtx processing The new order simplified the code a bit for next patches. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	cde74cba71	st/nine: Small simplification for position_t and fog position_t disables fog computation. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	5d2a8e8a36	st/nine: Cleaning code for vs temporaries This has been a real mess up to now: the temporaries were allocated once, and shared after that between the different parts of the code. To help maintaining the code, the temporaries are now allocated and released on need. As surprising as it could be, this patch, which was supposed to introduce no behaviour change, actually solved a visual bug observed on a sample program. This was due to ureg_normalize3 polluting a temporary variable. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	1f18b6f351	st/nine: No need for the local flag for temporaries in ff Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	eb9ad8f969	st/nine: Handle D3DRS_NORMALIZENORMALS When this state is set, the normals computed in the vs ff shader should be normalized. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:50 +02:00
Axel Davy	b9639c661f	st/nine: Initial ProcessVertices support For now only VS 3 support is implemented. This enables The Sims 2 to work. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:50 +02:00
Axel Davy	3bf02d383f	st/nine: Partial software vertex processing support Software Vertex Processing allows: . Less limitations for shaders (more loops, etc) . Less limitations for ff (more enabled lights, 255 matrices for VertexBlend) In particular shaders can get more constants. This patch implements support for this (not using software rendering, but hardware rendering, as llvmpipe and dx10+ hw have the same limits...) This is considered a second class path. Even apps asking for "Mixed Vertex processing" (ie the ability to switch to swvp on demand) do not use the feature much. Some just initialize more constants than the normal limit at the start of the application, but never use more than the normal limit. When the apps do not need the software vertex processing features, they do not seem to turn it on. This means it is ok if that path is slow. Thus no care has been made to make the path optimized. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	f8c8f44244	st/nine: Rework vs int and bool constants buffer This will help to support swvp constants. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	a83dce0128	st/nine: Change dirty tracking for vs int and bool constants This change makes easier to introduce tracking for swvp constants. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	f78089b962	st/nine: Drop unused constant upload path This path has been disabled for some time because of some bugs with it. It hasn't been updated to the new features, and is not faster. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:49 +02:00
Axel Davy	1604efa6fd	st/nine: Add support for swvp constants in shaders swvp has relaxed limits (more nested loops, etc). In particular it enables more constants. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	56ea3df7d4	st/nine: Initial mixed vertex processing support In mixed vertex processing, the user can enable or disable software vertex processing. It is on hardware by default. This feature is not a state, and thus the setting doesn't need to be recorded by stateblocks. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	747f1ef8b6	st/nine: Implement SetNPatchMode Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	ded7a73eb3	st/nine: Implement D3DUSAGE_SOFTWAREPROCESSING Buffers with this flag must be usable with both software and hardware vertex processing. Use Staging for fast cpu access. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:49 +02:00
Patrick Rudolph	19703f2a36	st/nine: Allocate more space for ATI1 ATIx are "unknown" formats that do not follow block format conventions. Tests showed that pitch*height bytes are allocated. apitrace used to depend on this behaviour. It used to copy more bytes than it has to for the ATI1 block format, but it didn't crash on Windows. Increase buffersize for ATI1 to fix this crash. The same issue was present in WINE but a patch has been sent by me. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Patrick Rudolph	ec6c636722	st/nine: Add missing break Add missing break instruction. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	03f60a3357	st/nine: Implement relative addressing for ps inputs To implement the feature we copy the ps inputs to a temp array. This is not optimal for performance, but it is the simplest solution. This is a feature that is very very rarely used. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	a5d308e51a	st/nine: Wait for pending tasks to execute in swapchain Fixes crash after Reset() when using thread_submit=true Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	f090705075	st/nine: Use fixed size arrays for swapchain buffers Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Patrick Rudolph	a719800cb8	st/nine: Fix buffer count check for Ex devices Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	9ff0dc3129	st/nine: Disable seamless cubemap for d3d d3d9 doesn't have seamless cubemap. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	f0ec54ee32	st/nine: Fix some check flags Uses the new defines introduced in previous commit. See comment in the commit for more explanation. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	39e98d351f	st/nine: Unify some check flags The new defines will be reused in a later patch. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:48 +02:00
Axel Davy	2290eac84e	gallium/util: Really allow aliasing of dst for u_box_union_* Gallium nine relies on aliasing to work with this function. Without this patch, dirty region tracking was incorrect, which could lead to incorrect textures or vertex buffers. Fixes several game bugs with nine. Fixes https://github.com/iXit/Mesa-3D/issues/234 Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-10-10 23:43:48 +02:00
Axel Davy	5e7f0ebe29	softpipe: Cap to 2 GB on 32 bits On 32 bits system, application memory is quite limited. softpipe uses application memory. To help prevent memory exhaustion, limit reported memory availability to 2GB. Some gallium nine apps do check reported memory by allocating resources until memory is full. Gallium nine refuses allocations when 80% of the reported memory limit is used. This change helps some apps to start. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-10-10 23:43:48 +02:00
Axel Davy	814ca96d0d	llvmpipe: Cap to 2 GB on 32 bits On 32 bits system, application memory is quite limited. llvmpipe uses application memory. To help prevent memory exhaustion, limit reported memory availability to 2GB. Some gallium nine apps do check reported memory by allocating resources until memory is full. Gallium nine refuses allocations when 80% of the reported memory limit is used. This change helps some apps to start. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-10-10 23:43:48 +02:00
Axel Davy	218459771a	gallium/os: Fix overflow on 32 bits On systems with more than 4GB of ram, os_get_total_physical_memory was triggering an integer overflow for the linux and haiku path, when on 32 bits. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94561 Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 23:43:48 +02:00
Axel Davy	9904581dc6	st/nine: Memset pipe_resource templates Fixes regression introduced by `ecd6fce261` and is more future proof than just clearing the next field. Other nine usages did already zero out the templates. Signed-off-by: Axel Davy <axel.davy@ens.fr> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-10 23:43:48 +02:00
Samuel Pitoiset	d43151318a	nvc0: fix valid range for shader buffers When offset != 0, the valid range was wrong because the second argument of util_range_add() is end, not size. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-10 21:32:16 +02:00
Ilia Mirkin	5239bd5920	nvc0/ir: fix overwriting of value backing non-constant gather offset Normally the value is an immediate, which is moved to some temporary, so there's no problem. In the case of a non-constant offset (as allowed by ARB_gpu_shader5), we have to take care to copy it first before using it to build up the bits. This fixes a compilation error observed in F1 2015. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-10-10 14:28:32 -04:00
Ilia Mirkin	ec05331a7b	nv50/ir: only stick one preret per function A function with multiple returns would have had multiple preret settings at the top of the function. While this is unlikely to have caused issues since we don't use functions in earnest, it could have in some cases overflowed the call stack, in case a function had a lot of early returns. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-10 10:45:06 -04:00
Nicolai Hähnle	1f95121626	radeonsi: make more use of si_have_tgsi_compute Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:38:33 +02:00
Nicolai Hähnle	38cfd5160a	gallium/radeon: assign a name to LLVM output variables in debug builds This can be helpful with R600_DEBUG=preoptir. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:38:30 +02:00
Nicolai Hähnle	39a29c2431	gallium/radeon: avoid redundant work with overlapping in/out arrays Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:37:50 +02:00
Nicolai Hähnle	77c81164bc	radeonsi: support ARB_compute_variable_group_size Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:36:42 +02:00
Rob Clark	495ba8884a	gallium: add missing zero-init for resource templates Mostly test code, plus one spot I noticed in r600. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-07 15:50:46 -04:00
Rob Clark	3ebfc44b42	freedreno: don't try to shadow layered textures We will only hit this with multi-planar YUV external images, so we would probably never hit this code path in the first place. But if we did, it wouldn't do the right thing so just bail. Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-10-07 15:50:46 -04:00
Rob Clark	f88f025e8c	freedreno/a3xx+a4xx: fix clip-plane lowering state If enabled clip-planes have changed, we need to mark program state dirty. Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-10-07 15:50:46 -04:00
Eric Anholt	20d91e5ce9	vc4: Don't worry about partial Z/S clear if the other is already cleared. We have to be careful to not smash the value they're clearing to, but other than that we're fine. Avoids quad clears in Processing, which likes to do glClear(Z\|S); glClear(Z). Improves performance of Processing's QuadRendering demo at 5000 quads by 5.46507% +/- 1.35576% (n=15 before, 32 after)	2016-10-06 18:29:16 -07:00
Eric Anholt	cb328123fe	vc4: Try to fix the HW-2116 workaround. We were incrementing the count at the end of vc4_start_draw(), except that that function returns immediately if we've already started drawing on this batch. It also failed to count the statechanges from the GFXH-515 workaround. This incidentally allows repeated glClear() to be coalesced, because the fast clears aren't counted in draw_calls_queued any more. Fixes most of the extra flushes in Processing, which emits glClear(Z\|S); glClear(Z); glClear(C) during its frame setup. Improves performance of Processing's QuadRendering demo at 5000 quads by 3.33538% +/- 2.05846% (n=21 before, 15 after)	2016-10-06 18:29:12 -07:00
Eric Anholt	bca9a58d04	vc4: Drop dead argument from vc4_start_draw().	2016-10-06 18:09:24 -07:00
Eric Anholt	9421a6065c	vc4: Fix fallback to quad clears of depth in GLX. The fix in the vc4-jobs series ended up triggering the fallback path on GLX apps that use depth but not stencil.	2016-10-06 18:09:24 -07:00
Eric Anholt	8810270d06	vc4: Add the format name in miptree_debug. I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format of "0" didn't tell me much.	2016-10-06 18:09:24 -07:00
Eric Anholt	ee577e7fa7	vc4: Fix perf debug formatting on partial Z/S clear.	2016-10-06 18:09:24 -07:00
Eric Anholt	7c7bcbbc7d	vc4: Drop destination register when it's unused. This slightly reduces instructions on shader-db, but I think it's just perturbing register allocation -- the allocator should have always trivially colored these nodes, before. This commit is just to make QIR code failing more intelligible when register allocation fails.	2016-10-06 18:09:24 -07:00
Eric Anholt	d4ae5ca823	vc4: Fix live intervals analysis for screening defs in if statements. If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.	2016-10-06 18:09:24 -07:00
Eric Anholt	06cc3dfda4	vc4: Fix simulator when more than one vc4_screen is opened. We would assertion fail in setting up the simulator the second time around. This at least postpones the assertion failure until we've closed all of the first set of screens and started opening a new set.	2016-10-06 18:09:24 -07:00
Eric Anholt	b30205b112	vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU. Fixes 100 piglit tests since the assertions were added to nir.h. What's amazing is that these tests used to pass, even when casting garbage.	2016-10-06 18:09:24 -07:00
Samuel Pitoiset	28ecd3eac2	nv50/ir: fix wrong check when optimizing MAD to SHLADD Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-07 01:13:06 +02:00
Samuel Pitoiset	a198883bf7	nvc0: dump program binary only when NV50_PROG_DEBUG is set When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-07 01:01:17 +02:00
Samuel Pitoiset	56a0bed2c1	nvc0: expose ARB_compute_variable_group_size Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-07 00:18:57 +02:00
Samuel Pitoiset	11e75fffeb	nv50/ir: set number of threads/block for variable local size When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-07 00:18:57 +02:00
Samuel Pitoiset	07bb4513c6	gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-07 00:18:57 +02:00
Karol Herbst	f96945c5b5	nv50/ir: optimize sub(a, 0) to a helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2016-10-06 19:39:51 +02:00
Jason Ekstrand	2ed17d46de	nir: Make nir_foo_first/last_cf_node return a block instead One of NIR's invariants is that control flow lists always start and end with blocks. There's no good reason why we should return a cf_node from these functions since we know that it's always a block. Making it a block lets us remove a bunch of code. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-10-06 09:16:37 -07:00
Steven Toth	e00fdd643b	gallium/hud: Remove superfluous debug No longer required. Signed-off-by: Steven Toth <stoth@kernellabs.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-10-06 16:37:06 +01:00
Emil Velikov	b634be0e69	svga: add svga_mksstats.h to the sources list Otherwise it won't be picked in the tarball and the build will fail. Fixes: `0035f7f136` ("svga: add guest statistic gathering interface") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-10-06 16:17:09 +01:00
Emil Velikov	9b7fd4080a	st/xvmc/tests: force enable assertions Similar to the other 'tests', enable assertions in xvmc_bench. This silences the GCC warnings about unused-variable(s), makes the program actually useful, as the XvMC API called. Atm the function calls are omitted, since they're called within the assert. Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-10-06 15:03:46 +01:00
Samuel Pitoiset	a41cfbbf2b	nvc0: dump program binary when chipset has been forced Currently, program binaries are only dumped at upload time, but when the chipset has been forced via NV50_PROG_CHIPSET we might want to show the generated code, especially with shaderdb. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-05 21:15:44 +02:00
Marek Olšák	cc4a19c4ad	radeonsi: fix texture border colors for compute shaders There are VM faults without this. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-05 21:03:54 +02:00
Marek Olšák	844f8268e1	gallium/radeon/winsyses: set reasonable max_alloc_size which is returned for GL_MAX_TEXTURE_BUFFER_SIZE. It doesn't have any other use at the moment. Bigger allocations are not rejected. This fixes GL45-CTS.texture_buffer.texture_buffer_max_size on Bonaire. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-05 21:03:54 +02:00
Marek Olšák	1b37e5541c	radeonsi: fix interpolateAt opcodes for .zw components Not returning garbage in .zw seems pretty important. This fixes: GL45-CTS.shader_multisample_interpolation.render.interpolate_at__check. Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-05 21:03:23 +02:00
Marek Olšák	300a8221e9	radeonsi: add assertions to validate interpolation flags Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-05 21:03:23 +02:00
Marek Olšák	d4a8bf89ce	radeonsi: interpolate colors after interpolation weight shuffling Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-05 21:03:23 +02:00
Marek Olšák	faee2d6dda	tgsi/scan: don't set interp flags for inputs only used by INTERP (v2) (v1 pushed, then reverted) This fixes 9 randomly failing tests on radeonsi: GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.* v2: use input_interpolate[input] (correct) instead of input_interpolate[index] (incorrect) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-05 21:03:23 +02:00

... 3 4 5 6 7 ...

29279 Commits