KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Timothy Arceri	6e3b380387	docs: Mark AoA as done for i965 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-11-04 13:41:16 +11:00
Timothy Arceri	5b75dbd7be	i965: enable ARB_arrays_of_arrays Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2015-11-04 13:39:08 +11:00
Timothy Arceri	fb77da89f5	i965: add support for image AoA V3: clamp array index to the correct size (the size of the current array rather than the inner array) Francisco Jerez. V2: avoid useless zero-initialization and addition for the first AoA level, avoid redundant temporary, make use of type_size_scalar(), rename aoa_size to element_size, assign the indirect indexing temporary directly to image.reladdr, and replace while loop with a for loop. All suggested by Francisco Jerez. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-11-04 13:38:32 +11:00
Roland Scheidegger	9285ed98f7	llvmpipe: add cache for compressed textures compressed textures are very slow because decoding is rather complex (and because there's no jit code code to decode them too for non-technical reasons). Thus, add some texture cache which holds a couple of decoded blocks. Right now this handles only s3tc format albeit it could be extended to work with other formats rather trivially as long as the result of decode fits into 32bit per texel (ideally, rgtc actually would decode to more than 8 bits per channel, but even then making it work for it shouldn't be too difficult). This can improve performance noticeably but don't expect wonders (uncompressed is unsurprisingly still faster). It's also possible it might be slower in some cases (using nearest filtering for example or if there's otherwise not many cache hits, the cache is only direct mapped which isn't great). Also, actual decode of a block relies on util code, thus even though always full blocks are decoded it is done texel by texel - this could obviously benefit greatly from simd-optimized code decoding full blocks at once... Note the cache is per (raster) thread, and currently only used for fragment shaders. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-11-04 02:51:02 +01:00
Oded Gabbay	39b4dfe6ab	llvmpipe: use simple coeffs calc for 128bit vectors There are currently two methods in llvmpipe code to calculate coeffs to be used as inputs for the fragment shader. The two methods use slightly different ways to do the floating point calculations and thus produce slightly different results. The decision which method to use is determined by the size of the vector that is used by the platform. For vectors with size of more than 128bit, a single-step method is used, in which coeffs_init_simple() + attribs_update_simple() are called. For vectors with size of 128bit or less, a two-step method is used, in which coeffs_init() + attribs_update() are called. This causes some piglit tests (clip-distance-bulk-copy, interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with 128bit vectors (such as ppc64le or x86-64 without AVX). This patch makes platforms with 128bit vectors use the single-step method (aka "simple" method) instead of the two-step method. This would make the resulting coeffs identical between more platforms, make sure the piglit tests passes, and make debugging and maintainability a bit easier as the generated LLVM IR will be the same for more platforms. The performance impact is negligible for x86-64 without AVX, and basically non-existent for ppc64le, as it can be seen from the following benchmarking results: - glxspheres, on ppc64le: - original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec - Additional 0.8% performance boost - glxspheres, on x86-64 without AVX: - original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec - Additional 0.74% performance boost - glmark2, on ppc64le: - original code: score of 58 - with my change: score of 57 - glmark2, on x86-64 without AVX: - original code: score of 175 - with the patch: score of 167 - Impact of of -4.5% on performance - OpenArena, on ppc64le: - original code: 3398 frames 1719.0 seconds 2.0 fps 255.0/505.9/2773.0/0.0 ms - with the patch: 3398 frames 1690.4 seconds 2.0 fps 241.0/497.5/2563.0/0.2 ms - 29 seconds faster with the patch, which is about 2% - OpenArena, on x86-64 without AVX: - original code: 3398 frames 239.6 seconds 14.2 fps 38.0/70.5/719.0/14.6 ms - with the patch: 3398 frames 244.4 seconds 13.9 fps 38.0/71.9/697.0/14.3 ms - 0.3 fps slower with the patch (about 2%) Additional details can be found at: http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-11-04 02:38:53 +01:00
Kenneth Graunke	59bbe2681b	nir: Properly invalidate metadata in nir_opt_remove_phis(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2015-11-03 17:06:48 -08:00
Kenneth Graunke	bc3942e297	nir: Properly invalidate metadata in nir_lower_vec_to_movs(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2015-11-03 17:06:48 -08:00
Kenneth Graunke	0f037bd71f	nir: Properly invalidate metadata in nir_opt_copy_prop(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2015-11-03 17:06:48 -08:00
Kenneth Graunke	4cb7546066	nir: Properly invalidate metadata in nir_remove_dead_variables(). v2: Preserve live_variables too (Jason). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2015-11-03 17:06:48 -08:00
Kenneth Graunke	8bb44510fc	nir: Properly invalidate metadata in nir_split_var_copies(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2015-11-03 17:06:48 -08:00
Kenneth Graunke	aea40091f0	nir: Properly invalidate metadata in nir_lower_global_vars_to_local(). v2: Preserve nir_metadata_live_variables as well (caught by Jason). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2015-11-03 17:06:48 -08:00
Jason Ekstrand	531be601d5	nir: Unexpose _impl versions of copy_prop and dce Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-11-03 17:06:48 -08:00
Jordan Justen	4bc16ad217	mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: Iago Toral <itoral@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>	2015-11-03 16:44:22 -08:00
Matt Turner	cf3121ed18	i965/vec4: Send from GRF in atomic operations. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-11-03 16:38:36 -08:00
Marek Olšák	3b37155a68	gallium/radeon: allow returning SDMA fences from pipe->flush pipe->flush never returned SDMA fences. This fixes it. This is only an issue on amdgpu where fences can signal out of order. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-04 00:43:14 +01:00
Marek Olšák	7f9122c968	gallium/radeon: always return the last SDMA fence on SDMA flush if needed Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-04 00:43:14 +01:00
Kenneth Graunke	36fd653817	i965: Add scalar geometry shader support. This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support instanced geometry shaders, and Orbital Explorer's shader spills like crazy. But the infrastructure is in place, and it's largely working. v2: Lots of rebasing. v3: (feedback from Kristian Høgsberg) - Handle stride and subreg_offset correctly for ATTRs; use a helper. - Fix missing emit_shader_time_end() call. - Delete dead code after early EOT in static vertex case to avoid tripping asserts in emit_shader_time_end(). - Use proper D/UD type in intexp2(). - Fix "EndPrimitve" and "to that" typos. - Assert that invocations == 1 so we know this is missing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-11-03 15:08:49 -08:00
Kenneth Graunke	c9541a74e4	i965: Add scalar GS input lowering code. We really ought to compute the VUE map at link time and stash it, rather than recomputing it here, but with the mess of program structures I wasn't sure where to put it. We can improve that later. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-11-03 15:08:49 -08:00
Kenneth Graunke	4861835d1c	i965: Fix the fs_visitor GS constructor to take shader_time_index. Jason reworked this so it isn't simply ST_GS anymore...it's either -1 (not enabled) or an actual offset. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-11-03 15:08:49 -08:00
Ben Widawsky	5d4b019d2a	i965/gen8+: Extract color clear surface state On future generation platforms the color clear value is stored elsewhere in the surface state. By extracting this logic, we can cleanly implement the difference in an upcoming patch. Should have no functional impact. v2: Move hunk from the next patch into this patch (Matt) Whitespace fix (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Neil Roberts <neil@linux.intel.com>	2015-11-03 13:49:21 -08:00
Ben Widawsky	f3223ebd6c	i965/gen8+: Remove redundant zeroing of surface state The allocate_surface_state already zeroes out the surface state, and doing it later in the function is destructive for what we want to accomplish when we split out support for gen9 fast clears (next patch). NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent to remove the other instances as well. I can make an argument both ways (open coding it, vs. not). I can rework the next patch if requires. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chad Versace <chad.versace@intel.com> Reviewed-by: Neil Roberts <neil@linux.intel.com>	2015-11-03 13:49:21 -08:00
Samuel Pitoiset	e887407491	nvc0: add missing compute parameters required by clover This fixes crashes with some piglit OpenCL tests. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-03 22:17:00 +01:00
Samuel Pitoiset	e640ba41ed	nvc0: handle NULL pointer in nvc0_get_compute_param() To get the size (in bytes) of a compute parameter, clover first calls get_compute_param() with a NULL data pointer. The RET() macro is based on nv50. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-03 22:16:45 +01:00
Ben Widawsky	dde33fc23c	i965/skl: PCI ID cleanup and brand strings A few new PCI ids are added here, and one is removed (0x190B) because it no longer seems to exist anywhere. v2-4: Only use ascii characters (Ilia) 0x1921 is no longer marked as f Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>	2015-11-03 10:00:17 -08:00
Ben Widawsky	7cbd6608f5	i965/skl: Add GT4 PCI IDs Like other gen8+ hardware, the hardware automatically scales up thread counts. We must be careful about the URB sizes since GT4 adds another slice. One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a real bug since the URB size will be wrong. Because this patch is simply meant to add the missing IDs, that will be fixed in a later patch. v2: No longer relevant. v3: Update the wm thread count to support GT4. The WM thread count is used to determine the maximum scratch space required. Currently the code always allocates the maximum amount even though lower GT SKUs require less. The formula is threads_per_psd * subslices_per_slice * slices Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>	2015-11-03 09:45:04 -08:00
Jordan Justen	55365a7ad5	mesa: Add spec citations for DispatchCompute* Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute appears to have an error regarding the max allowed values. When adding the specification citation, we note why the code does not match the specification language. v2: * Updates based on review from Iago Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>	2015-11-02 15:25:37 -08:00
Jordan Justen	44c399f20a	mesa: Update DispatchComputeIndirect errors for indirect parameter There is some discrepancy between the return values for some error cases for the DispatchComputeIndirect call in the ARB_compute_shader specification. Regarding the indirect parameter, in one place the extension spec lists that the error returned for invalid values should be INVALID_OPERATION, while later it specifies INVALID_VALUE. The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent in requiring the INVALID_VALUE error return in this case. Here we update the code to match the main specifications, and update the citations use the main specification rather than the extension specification. v2: * Updates based on review from Iago Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>	2015-11-02 15:25:37 -08:00
Matt Turner	0b19f65195	i965/fs: Clean up FBH code. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-11-02 09:33:31 -08:00
Matt Turner	c22d62f599	i965/vec4: Clean up FBH code. It did a bunch of unnecessary stuff, emitting an extra MOV included. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-11-02 09:33:31 -08:00
Matt Turner	7c81a6a647	i965: Replace default case with list of enum values. If we add a new file type, we'd like to get warnings if it's not handled. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-11-02 09:33:31 -08:00
Matt Turner	d9b09f8a30	i965/vec4: Don't disable channels in any/all comparisons. We've made a mistake in calling the Channel Enable bits "writemask", because they do more than control which channels of the destination are written -- they actually control which channels are enabled (surprise! surprise!) So, if we emit cmp.z.f0(8) null.xy<1>D g10<4,4,1>.xyzzD g2<0,4,1>.xyzzD mov(8) g12<1>.xUD 0x00000000UD (+f0.all4h) mov(8) g12<1>.xUD 0xffffffffUD where the CMP instruction has only .xy channel enables, it won't write the .zw channels of the flag register, which are of course read by the +f0.all4 predicate. We need to always emit CMP instructions whose flag result might be read by such a predicate with all channels enabled. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-11-02 09:33:31 -08:00
Tapani Pälli	f4466c856f	mesa: fix uniforms calculation in glGetProgramiv Since introduction of SSBO, UniformStorage contains not just uniforms but also buffer variables, this needs to be taken in to account when calculating active uniforms with GL_ACTIVE_UNIFORMS and GL_ACTIVE_UNIFORM_MAX_LENGTH. No Piglit regressions. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2015-11-02 11:22:10 +02:00
Tapani Pälli	efb333acb7	mesa: fix program resource queries for atomic counter buffers gl_active_atomic_buffer contains index to UniformStorage, we need to calculate resource index for that gl_uniform_storage. Fixes following CTS tests: ES31-CTS.program_interface_query.atomic-counters ES31-CTS.program_interface_query.atomic-counters-one-buffer No Piglit regressions. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>	2015-11-02 11:22:06 +02:00
Juha-Pekka Heikkila	c2c124f891	glsl: join calculate_array_size() and calculate_array_stride() These helpers are ran for same case the same loop. Here joined their operation so the loop is ran just once. Also fixed out-of-memory condition here. v2: Make the loop simpler to read as per Tapani's suggestion Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Tested-by: Tapani Pälli <tapani.palli@intel.com>	2015-11-02 10:03:32 +02:00
Ryan Houdek	af7c98a9c7	mesa: expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES This has been tested with the piglits in the mailing list and on the Dolphin emulator. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-01 23:02:06 -05:00
Ilia Mirkin	985b51551a	nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-11-01 20:15:15 -05:00
Samuel Pitoiset	00bb524716	nv50: use correct heaps for FP and GP code segments This is just a cosmetic change. Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2015-11-01 23:29:20 +01:00
Jordan Justen	39bb59a566	mesa/sso: Add compute shader support Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> [itoral@igalia.com: Reviewed-by for all except the ctx->_Shader change] Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-11-01 01:10:01 -07:00
Jordan Justen	6e11855050	mesa/sso: Add MESA_VERBOSE=api trace support v2: * Use %u for unsigned values (Iago) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-11-01 01:09:20 -07:00
Jordan Justen	5bfe2835c2	i965: Setup pull constant state for compute programs Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-11-01 00:35:12 -07:00
Jordan Justen	a4a416f567	main/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>	2015-11-01 00:11:42 -07:00
Jordan Justen	218e94906d	glsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules The OpenGLES GLSL 3.1 specification uses the precision qualifier ordering rules from ARB_shading_language_420pack. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>	2015-10-31 23:17:06 -07:00
Jordan Justen	b6e9b2b7a0	glsl: Add compute shader builtin variables for OpenGLES 3.1 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>	2015-10-31 23:08:09 -07:00
Ilia Mirkin	67635a0a71	nouveau: get rid of tabs Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-10-31 19:58:14 -04:00
Connor Abbott	0ef8c5cb96	i965/sched: don't calculate live intervals for post-RA scheduling For some reason, this causes assertions on gm965 only. In any case, it's unnecessary since we don't need liveness information in the post-RA scheduler. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92744 Cc: Mark Janes <mark.a.janes@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-10-31 08:05:52 -07:00
Dave Airlie	425d8c2578	virgl/vtest: fix extra malloc This somehow got added twice, drop the first one. Reported by Coverity. Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-10-31 18:05:33 +10:00
Dave Airlie	8d731ebd33	virgl: free sampler view on failure path Reported by Coverity. Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-10-31 16:16:44 +10:00
Dave Airlie	7153b12651	gallium/swrast: fixup build breakage and warnings The front buffer rendering changes broke an interface, I didn't fix up all of them. Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-10-31 16:16:44 +10:00
Dave Airlie	2b67657096	gallium/swrast: fix front buffer blitting. (v2) So I've known this was broken before, cogl has a workaround for it from what I know, but with the gallium based swrast drivers BlitFramebuffer from back to front or vice-versa was pretty broken. The legacy swrast driver tracks when a front buffer is used and does the get/put images when it is mapped/unmapped, so this patch attempts to add the same functionality to the gallium drivers. It creates a new context interface to denote when a front buffer is being created, and passes a private pointer to it, this pointer is then used to decide on map/unmap if the contents should be updated from the real frontbuffer using get/put image. This is primarily to make gtk's gl code work, the only thing I've tested so far is the glarea test from https://github.com/ebassi/glarea-example.git v2: bump extension version, check extension version before calling get image. (Ian) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930 Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-10-31 16:04:36 +10:00
Timothy Arceri	103de0225b	glsl: set image access qualifiers for AoA Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-10-31 08:37:08 +11:00

1 2 3 4 5 ...

74027 Commits All Branches Search

74027 Commits

All Branches