KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Niels Ole Salscheider	b336f51cc7	clover: Fix linkage of libOpenCL Clover needs the irreader component of llvm v2: Check for irreader component irreader is only available with LLVM 3.3 >= 177971 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>	2013-04-08 07:08:10 -07:00
Vincent Lejeune	5019af2145	r600g/llvm: Add support for native isa for pre EG This fixes bug 62756 : https://bugs.freedesktop.org/show_bug.cgi?id=62756#c12	2013-04-08 15:11:59 +02:00
Marek Olšák	eff66bc9f8	gallium/util: add const to a parameter of util_max_layer	2013-04-06 23:57:15 +02:00
Marek Olšák	08275b25cc	st/mesa: don't expose ARB_color_buffer_float without driver support in GL core Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-06 23:57:12 +02:00
Marek Olšák	3264c3e997	mesa: allow drivers not to expose ARB_color_buffer_float in GL core profile Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-06 23:57:10 +02:00
Marek Olšák	9d4f67600b	mesa: move updating clamp control derived state out of mesa_update_state_locked It has 2 dependencies: glClampColor and the framebuffer, we might just as well do the update where those two are changed. v2: cosmetic changes from Brian's email Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-06 23:57:09 +02:00
Marek Olšák	755648c37f	mesa: don't set _ClampFragmentColor to TRUE if it has no effect This should reduce shader recompilations with drivers that emulate fragment color clamping, because we want the clamping to be enabled only if there is a signed normalized or floating-point colorbuffer. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-06 23:57:06 +02:00
Marek Olšák	21d407c1b8	mesa: refactor clamping controls, get rid of _ClampReadColor v2: cosmetic changes from Brian's email Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-06 23:57:04 +02:00
Chris Forbes	c4629ad3f9	mesa: don't memcmp() off the end of a cache key. Reported-by: `per` in #intel-gfx The size of the cache key varies, so store the actual size as well as the key blob itself, rather than just assuming it's the same as the size passed in. NOTE: This is a candidate for stable branches. V2: Don't leave silly holes in structure; use unsigned instead of GLuint. V3: Fix missing case for `last` match. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-04-06 18:30:08 +13:00
Tom Stellard	302f53dc20	radeonsi: Add compute support v3 v2: - Only dump shaders when env variable is set. v3: - Don't emit VGT registers Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com	2013-04-05 18:43:34 -04:00
Tom Stellard	4f7fe2cf2c	radeonsi: Set TCL1_ACTION_ENA when invalidating the texture cache Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com	2013-04-05 18:43:34 -04:00
Tom Stellard	0ccf82c557	radeonsi: Remove si_pm4_inval_vertex_cache() This function is a holdover from r600g and is identical to si_pm4_inval_texture_cache(), so it is not needed. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com	2013-04-05 18:43:34 -04:00
Tom Stellard	c5e5b3401c	gallium: PIPE_COMPUTE_CAP_IR_TARGET - allow drivers to specify a processor v2 This target string now contains four values instead of three. The old processor field (which was really being interpreted as arch) has been split into two fields: processor and arch. This allows drivers to pass a more a more detailed description of the hardware to compiler frontends. v2: - Adapt to libclc changes Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2013-04-05 18:43:34 -04:00
Wladimir	1a868acbec	util: add ETC as compressed format Add UTIL_FORMAT_LAYOUT_ETC to util_format_is_compressed. It was missing. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-05 16:14:51 -06:00
Brian Paul	de99b6d117	gallium/u_blitter: fix is_blit_generic_supported() stencil checking Don't check if there's sampler support for stencil if we're not going to actually blit/copy stencil values. Fixes the case where we mistakenly said we can't support a blit of depth values from S8Z24 to X8Z24. Also, rename the is_stencil variable to dst_has_stencil to improve readability. NOTE: This is a candidate for the stable branches. Reviewed-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-05 16:14:51 -06:00
Alexander Monakov	9cda356004	Honor GLX_DONT_CARE in MATCH_MASK NOTE: This is a candidate for stable branches. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47478 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62999 Bugzilla: http://bugs.winehq.org/show_bug.cgi?id=26763	2013-04-05 14:32:45 -07:00
Rob Clark	aac7f06ad8	freedreno: use autogenerated register defs Switch to use the envytools generated headers for register/bitfield definitions. This is the first step in preparing to add a3xx support, since it avoids having conflicting names for a3xx and a2xx registers. And since I'm using envytools for a3xx it is simpler to just use it for everything. This shouldn't cause any functional change, it is really just a lot of renaming. Signed-off-by: Rob Clark <robdclark@gmail.com>	2013-04-05 14:33:16 -04:00
José Fonseca	1fefc65d20	st/wgl: Install our windows message hook to threads created before the ICD is loaded. Otherwise we will not receive destroy windows events, causing framebuffers to leak. This happens particularly with java and jogl. Tested with java + jogl, MATLAB. VMware Internal Bug Number: 1013086. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-05 18:27:54 +01:00
Adam Jackson	ca70de9bd2	llvmpipe: Work without sse2 if llvm is new enough At least on llvm 3.2 this appears to work fine. Tested on an Athlon XP 2600+, which has sse and 3dnow but not sse2. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2013-04-05 11:32:53 -04:00
Jerome Glisse	b8998f976e	winsys/radeon: add command stream replay dump for faulty lockup v3 Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to enable it. When enabled after each cs submission the code will try to detect lockup by waiting on one of the buffer of the cs to become idle, after a timeout it will consider that the cs triggered a lockup and will write a radeon_lockup.c file in current directory that have all information for replaying the cs. To build this file : gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm v2: Add radeon_ctx.h file to mesa git tree v3: Slightly improve dumped file for easier editing, only dump first faulty cs Signed-off-by: Jerome Glisse <jglisse@redhat.com>	2013-04-05 10:22:05 -04:00
Brian Paul	5192262833	st/xlib: add HUD support for xlib/GLX For the softpipe and llvmpipe drivers. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 17:00:42 -06:00
Brian Paul	f5071783c1	gallium/hud: add GALLIUM_HUD_PERIOD env var To set the graph update rate, in seconds. The default update rate has also been changed to 1/2 second. Reviewed-by: Marek Olšák <maraeo@gmail.com>	2013-04-04 17:00:42 -06:00
Brian Paul	6211c45186	gallium/hud: initialize sampler state The default wrap mode (PIPE_TEX_WRAP_REPEAT) is incompatible with unnormalized texcoords (at least for softpipe). v2: use PIPE_TEX_WRAP_CLAMP_TO_EDGE Reviewed-by: Marek Olšák <maraeo@gmail.com>	2013-04-04 17:00:42 -06:00
Kenneth Graunke	edc52a8f28	glsl: Add an optimization pass to flatten simple nested if blocks. GLBenchmark 2.7's shaders contain conditional blocks like: if (x) { if (y) { ... } } where the outer conditional's then clause contains exactly one statement (the nested if) and there are no else clauses. This can easily be optimized into: if (x && y) { ... } This saves a few instructions in GLBenchmark 2.7: total instructions in shared programs: 11833 -> 11649 (-1.55%) instructions in affected programs: 8234 -> 8050 (-2.23%) It also helps CS:GO slightly (-0.05%/-0.22%). More importantly, however, it simplifies the control flow graph, which could enable other optimizations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	967514ce68	i965: Use a variable for the push constant size in kB. This clarifies that the offset of 2 is actually 16 kB / 8kB units. It also keys both computations off of a single variable, which should make it easier to change in the future. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	8cdb2d32ec	i965: Turn brw->urb.vs_size and gs_size into local variables. These variables are only used within a single function, so we may as well make them local variables. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	b99ad7f02c	i965: Remove BRW_NEW_WM_INPUT_DIMENSIONS dirty bit. This was only produced by the brw_wm_input_dimensions atom, which was removed in the previous commit. So there's no need for the dirty bit. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	d198546bac	i965: Delete brw_vs_constval.c and the brw_wm_input_sizes atom. This was only used to compute proj_attrib_mask, which was removed by the previous commit. That makes this dead code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	705c8247fa	i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field. The previous commit removed the last user of this field, so there's no longer any point in setting it. Removing this should eliminate state-dependent recompiles, and make the precompile more reliable. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	7183568869	i965: Remove fixed-function texture projection avoidance optimization. This optimization attempts to avoid extra attribute interpolation instructions for texture coordinates where the W-component is 1.0. Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes state atom (all the brw_vs_constval.c code) needs to run on each draw. It computes the input_size_masks array, then uses that to compute proj_attrib_mask. Differences in proj_attrib_mask can cause state-dependent fragment shader recompiles. We also often fail to guess proj_attrib_mask for the fragment shader precompile, causing us to needlessly compile it twice. Furthermore, this optimization only applies to fixed-function programs; it does not help modern GLSL-based programs at all. Generally, older fixed-function programs run fine on modern hardware anyway. The optimization has existed in some form since the initial commit. When we rewrote the fragment shader backend, we dropped it for a while. Eric readded it in commit `eb30820f26` as part of an attempt to cure a ~1% performance regression caused by converting the fixed-function fragment shader generation code from Mesa IR to GLSL IR. However, no performance data was included in the commit message, so it's unclear whether or not it was successful. Time has passed, so I decided to re-measure this. Surprisingly, Eric's OpenArena timedemo actually runs /faster/ after removing this and the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a 1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was no statistically significant difference (n = 37). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	32726b1af6	i965: Use ctx->Stencil._WriteEnabled in DEPTH_STENCIL_STATE. This is the same computation as the _WriteEnabled flag, so we may as well use it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	01bd29d681	i965: Fix stencil write enable flag in 3DSTATE_DEPTH_BUFFER on Gen7+. ctx->Stencil.WriteMask is a statically sized array of 3 elements. Checking it against 0 actually is a NULL check, and can never fail, which meant that we always said stencil writes were enabled. Use the new core Mesa derived state flag to fix this. NOTE: This is a candidate for stable branches. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-04-04 15:38:18 -07:00
Kenneth Graunke	1e3235d36e	mesa: Add new ctx->Stencil._WriteEnabled derived state flag. i965 needs to know whether stencil writes are enabled in several places, and gets the test wrong sometimes. While we could create a function to compute this, it seems generally useful enough to warrant a new piece of derived state. Also, all the plumbing is already in place. NOTE: This is a candidate for stable branches. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-04-04 15:38:18 -07:00
Roland Scheidegger	9eef86bb55	gallivm: some minor cube map cleanup The ar_ge_as_at variable was just very very confusing since the condition was actually the other way around (as_at_ge_ar). So change the condition (and the selects depending on it) to match the variable name. And also change the chosen major axis in case the coord values are the same. OpenGL doesn't care one bit which one is chosen in this case but it looks like dx10 would require z chosen over y, and y chosen over x (previously did x chosen over y, y chosen over z). Since it's all the same effort just honor dx10's wishes. (Though actually, for some prefered orderings, we could save one (or two with derivatives) selects since the tnewx and tnewz (and the corresponding dmax values) are the same.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 23:22:10 +02:00
Eric Anholt	b6e9b54d06	i965: Ask the register allocator to round-robin through registers. The way we were allocating registers before, packing into low register numbers for Ironlake, resulted in an overly-constrained dependency graph for instruction scheduling. Improves GLBenchmark 2.1 performance by 4.5% +/- 0.7% (n=26). No difference on my old GLSL demo (n=20). No difference on nexuiz (n=15). v2: Fix off-by-one bug that made the change only work for 16-wide on i965. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-04 12:51:06 -07:00
Zack Rusin	be9a42e980	llvmpipe: implement ucmp and add a test for it Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-04 12:09:55 -07:00
Paul Berry	5db2249493	Avoid spurious GCC warnings in STATIC_ASSERT() macro. GCC 4.8 now warns about typedefs that are local to a scope and not used anywhere within that scope. This produced spurious warnings with the STATIC_ASSERT() macro (which used a typedef to provoke a compile error in the event of an assertion failure). This patch switches to a simpler technique that avoids the warning. v2: Avoid GCC-specific syntax. Also update p_compiler.h. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-04 09:52:18 -07:00
Erik Faye-Lund	456f40e18d	freedreno: document debug flag Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Brian Paul <brianp@vmware.com>	2013-04-04 10:41:50 -06:00
Brian Paul	e95514c0ea	st/wgl: add HUD support v2: fix a few minor issues spotted by Jose. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-04 10:41:35 -06:00
Brian Paul	0c1dcf906d	st/wgl: make stw_current_context() non-static Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-04 08:50:16 -06:00
Brian Paul	92e5e45ff1	util: add debug_memory_check_block(), debug_memory_tag() The former just checks that the given block is valid by checking the header and footer. The later sets the memory block's tag. With extra debug code, we can use that for monitoring/checking particular allocations. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-04 08:50:15 -06:00
Brian Paul	a408ea9692	gallium/hud: replace malloc w/ MALLOC To match the FREE() called used later. Fixes things on Windows. Reviewed-by: Marek Olšák <maraeo@gmail.com>	2013-04-04 08:50:15 -06:00
Vincent Lejeune	9276961223	r600g/llvm: Workaround for wrong tex.offset_*	2013-04-04 16:03:04 +02:00
Roland Scheidegger	ce5096a0a9	gallivm: honor explicit derivatives values for cube maps. This is trivial now, though need to make sure we pass all the necessary derivative values (which is 3 each for ddx/ddy not 2). Passes piglit arb_shader_texture_lod-texgradcube test. v2: add the forgotten abs() for all incoming derivatives (discovered by new piglit arb_shader_texture_lod-texgradcube test, though more by luck as it was failing only for exactly one pixel...). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 01:03:42 +02:00
Roland Scheidegger	f621015cb5	gallivm: do per-pixel cube face selection (finally!!!) This proved to be tricky, the problem is that after selection/mirroring we cannot calculate reasonable derivatives (if not all pixels in a quad end up on the same face the derivatives could get "randomly" exceedingly large). However, it is actually quite easy to simply calculate the derivatives before selection/mirroring and then transform them similar to the cube coordinates (they only need selection/projection, but not mirroring as we're not interested in the sign bit, of course). While there is a tiny bit more work to do (need to calculate derivs for 3 coords instead of 2, and additional selects) it also simplifies things somewhat for the coord selection itself (as we save some broadcast aos shuffles, and we don't need to calculate the average vector) - hence if derivatives aren't needed this should actually be faster. Also, this has the benefit that this will (trivially) work for explicit derivatives too, which we completely ignored before that (will be in a separate commit for better trackability). Note that while the way for getting rho looks very different, it should result in "nearly" the same values as before (the "nearly" is only because before the code would choose the face based on an "average" vector and hence the derivatives calculated according to this face, where now (for implicit derivatives) the derivatives are projected on the face selected for the first (top-left) pixel in a quad, so not necessarly the same face). The transformation done might not quite be state-of-the-art, calculating length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the same as before (that is I think a better transform would _somehow_ take the "derivative major axis" into account so that derivative changes in the major axis wouldn't get ignored). Should solve some accuracy problems with cubemaps (can easily be seen with the cubemap demo when switching wrapping/filtering), though we still don't do seamless filtering to fix it completely (so not per-sample but per-pixel is certainly better than per-quad and already sufficient for accurate results with nearest tex filter). As for performance, it seems to be a tiny bit faster too (maybe 3% or so with cubemap demo). Which I'd have expected with nearest/nearest filtering where this will be less instructions, but the difference seems to actually be larger with linear/linear_mipmap_linear where it is slightly more instructions, probably the code appears less serialized allowing better scheduling (on a sandy bridge cpu). It actually seems to be now at least as fast as the old path using a conditional when using 128bit vectors too (that is probably more a result of testing with a newer cpu though), for now that old path is still there but unused. No piglit regressions. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 01:03:42 +02:00
Roland Scheidegger	bdfbeb9633	gallivm: minor rho calculation optimization for 1 or 3 coords Using a different packing for the single coord case should save a shuffle. Plus some minor style fixes. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 01:03:42 +02:00
Roland Scheidegger	067a0ae420	gallivm: use f16c hw support for float->half and half->float conversion Should be way faster of course on cpus supporting this (includes AMD Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)). Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-04 01:03:42 +02:00
Zack Rusin	302df7cc85	draw/llvmpipe: allow independent so attachments to the vs When geometry shaders are present, one needs to be able to create an empty geometry shader with stream output that needs to be resolved later and attached to the currently bound vertex shader. Lets add support for it to llvmpipe and draw. draw allows attaching independent stream output info to any vertex shader and llvmpipe resolves at draw time which vertex shader the given empty geometry shader should be linked to. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-03 10:16:25 -07:00
Zack Rusin	246e68735f	llvmpipe: reset so buffers when not appending We need to reset the internal state of the so buffers or we'll keep appending even though we're not supposed to. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-03 10:16:25 -07:00
Zack Rusin	7ca65a68e1	draw: remove unused function we use draw_set_mapped_so_targets nowadays Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-03 10:16:25 -07:00

1 2 3 4 5 ...

55924 Commits All Branches Search

55924 Commits

All Branches