KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Iago Toral Quiroga	8a81ac2eed	v3d: emit geometry shader state commands This is good enough to get basic GS workloads working, later patches will improve this by adding instancing support, proper SIMD configuration, etc. Notice that most of the TESSELLATION_GEOMETRY_SHADER_PARAMS fields are only relevant when tessellation shaders are present. We do not support tessellation yet, but we still need to fill in these tessellation state with default values since our packing functions require some of these to have non-zero values. v2: - Add a comment in the code explaining why we fill in tessellation fields (Alejandro) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-12-16 08:42:37 +01:00
Iago Toral Quiroga	5d578c27ce	v3d: add initial compiler plumbing for geometry shaders Most of the relevant work happens in the v3d_nir_lower_io. Since geometry shaders can write any number of output vertices, this pass injects a few variables into the shader code to keep track of things like the number of vertices emitted or the offsets into the VPM of the current vertex output, etc. This is also where we handle EmitVertex() and EmitPrimitive() intrinsics. The geometry shader VPM output layout has a specific structure with a 32-bit general header, then another 32-bit header slot for each output vertex, and finally the actual vertex data. When vertex shaders are paired with geometry shaders we also need to consider the following: - Only geometry shaders emit fixed function outputs. - The coordinate shader used for the vertex stage during binning must not drop varyings other than those used by transform feedback, since these may be read by the binning GS. v2: - Use MAX3 instead of a chain of MAX2 (Alejandro). - Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro) - Update comment in IO owering so it includes the GS stage (Alejandro) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-12-16 08:42:37 +01:00
Iago Toral Quiroga	ca475d5fba	v3d: actually root the first BO in a command list in the job We were passing cl->bo, which is NULL, so v3d_job_add_bo was a no-op. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-12-13 08:58:10 +00:00
Iago Toral Quiroga	18a09e788d	v3d: fix indirect BO allocation for uniforms We were always ensuring a minimum size of 4 bytes for uniforms for the case where we don't have any, to account for hardware pre-fetching of the uniform stream, however, pre-fetching could also lead to to out of bounds reads when have read the last uniform in the stream, so we probably want to have the extra 4 bytes to prevent the kernel from observing invalid memory accesses when the uniform stream sits right at the end of a page. This seems to fix MMU exceptions reported with a Linux 5.4 kernel. Credit goes to Phil Elwell for identifying the problem and narrowing it down to memory accesses in the uniform stream. Reported-by: Phil Elwell <phil@raspberrypi.org> Tested-by: Phil Elwell <phil@raspberrypi.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-11-27 08:43:13 +01:00
Eric Anholt	882ca6dfb0	util: Move gallium's PIPE_FORMAT utils to /util/format/ To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to move their helpers out of gallium. Since u_format used util_copy_rect(), I moved that in there, too. I've put it in a separate directory in util/ because it's a big chunk of related code, and it's not clear to me whether we might want it as a separate library from libmesa_util at some point. Closes: #1905 Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-11-14 10:47:20 -08:00
Iago Toral Quiroga	e7e501efce	v3d: rename vertex shader key (num)_fs_inputs fields Until now this made sense because we always paired vertex shaders with fragment shaders, but as soon as we implement geometry and tessellation shaders that will no longer be the case, so rename this to (num_)used_outputs. v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-31 08:46:35 +00:00
Timothy Arceri	7f106a2b5d	util: rename list_empty() to list_is_empty() This makes it clear that it's a boolean test and not an action (eg. "empty the list"). Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-10-28 11:24:38 +00:00
Erik Faye-Lund	65328bd32d	Revert "v3d: do not report alpha-test as supported" This reverts commit `9d0523b569`. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>	2019-10-23 13:03:55 +02:00
Jose Maria Casanova Crespo	f8da0f6198	v3d: Explicitly expose OpenGL ES Shading Language 3.1 This will expose GL_EXT_primitive_bounding_box and GL_OES_primitive_bounding_box after previous commits expose OpenGL ES 3.1 once Compute Shaders are available. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-18 14:08:52 +02:00
Iago Toral Quiroga	db87439232	v3d: request the kernel to flush caches when TMU is dirty This adapts the v3d driver to the new CL submit ioctl interface that allows the driver to request a flush of the caches after the render job has completed. This seems to eliminate the kernel write violation errors reported during CTS and Piglit excutions, fixing some CTS tests and GPU resets along the way. v2: - Adapt to changes in the kernel side. - Disable shader storage and shader images if the kernel doesn't implement cache flushing. Fixes CTS tests: KHR-GLES31.core.shader_image_size.basic-nonMS-fs-float KHR-GLES31.core.shader_image_size.basic-nonMS-fs-int KHR-GLES31.core.shader_image_size.basic-nonMS-fs-uint KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-float KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-int KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-uint KHR-GLES31.core.shader_atomic_counters.advanced-usage-many-draw-calls2 KHR-GLES31.core.shader_atomic_counters.advanced-usage-draw-update-draw KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-int KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-matR KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-struct KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-matC-pad KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-vec Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-18 14:08:52 +02:00
Eric Anholt	66e2d3b69f	v3d: Add Compute Shader support Now that the UAPI has landed, add the pipe_context function for dispatching compute shaders. This is the last major feature for GLES 3.1, though it's not enabled quite yet.	2019-10-18 14:08:52 +02:00
Iago Toral Quiroga	46182fc1da	v3d: add new flag dirty TMU cache at v3d_compiler That we set for any TMU write on spills and general tmu. It is then used as part of v3d_emit_gl_shader_state later. v2: add a new flag instead at v3d_compiler instead of dirty the flag at v3dx if there is any spill (change suggested by Eric, added by Alejandro) v3: set this for anything that is not a load and do it also in v3d40_vir_emit_image_load_store (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-18 14:08:52 +02:00
Iago Toral Quiroga	d2203d74c6	v3d: trivial update to obsolete comment Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-18 14:08:52 +02:00
Erik Faye-Lund	9d0523b569	v3d: do not report alpha-test as supported This triggers lowering in the state-tracker, which makes things a bit simpler. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-10-17 10:41:36 +02:00
Iago Toral	e353656f3d	v3d: drop unused shader_rec_count member from context Looks like this was copied from the vc4 driver where it is actually included in the submit CL ioctl. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-15 06:56:45 +00:00
Marek Olšák	09e0e4c93c	gallium: remove PIPE_SHADER_CAP_SCALAR_ISA Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-10 15:49:19 -04:00
Alejandro Piñeiro	fa41a51891	v3d: take into account prim_counts_offset Specifically when reading the primitive counters. This fixed ~700 CTS tests using this pattern: dEQP-GLES3.functional.transform_feedback.* when run after tests like dEQP-GLES3.functional.prerequisite.read_pixels on the same caselist. When run individually those tests were passing because prim_counts_offset was zero. Fixes: `0f2d1dfe65` ("v3d: use the GPU to record primitives written to transform feedback") Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-10-10 09:51:50 +02:00
Iago Toral Quiroga	2eace10c62	v3d: fix TF primitive counts for resume without draw The V3D documentation states that primitive counters are reset when we emit Tile Binning Mode Configuration items, which we do at the start of each draw call, however, in the actual hardware this doesn't seem to take effect when transform feedback is not active (this doesn't happen in the simulator). This causes a problem in the following scenario: glBeginTransformFeedback() glDrawArrays() glPauseTransformFeedback() glDrawArrays() glResumeTransformFeedback() glEndTransformFeedback() The TF pause will trigger a flush of the primitive counters, which results in a correct number of primitives up to that point. In theory, the counter should then be reset when we execute the draw after pausing TF, but that doesn't happen, and since TF is enabled again by the resume command before we end recording, by the time we end the transform feedback recording we again check the counters, but instead of reading 0, we read again the same value we read at the time we paused, incorrectly accumulating that value again. In theory, we should be able to avoid this by using the other method to reset the primitive counters: using operation 1 instead of 0 when we flush the counts to the buffer at the time we pause, but again, this doesn't seem to be work and we still see obsolete counts by the time we end transform feedback. This patch fixes the problem by not accumulating TF primitive counts unless we know we have actually queued draw calls during transform feedback, since that seems to effectively reset the counters. This should also be more performant, since it saves unnecessary stalls for the primitive counters to be updated when we know there haven't been any new primitives drawn. Fixes CTS tests: dEQP-GLES3.functional.transform_feedback.* Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-13 06:53:26 +00:00
Iago Toral Quiroga	ded6ea9209	v3d: remove redundant update of queued draw calls This was updating the counter for the indexed draw path only, but we are already updating the counter for all paths a bit later, so this is only duplicating counts for indexed paths. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-13 06:53:26 +00:00
Iago Toral Quiroga	b9a07eed00	v3d: make sure we have enough space in the CL for the primitive counts packet Fixes: `0f2d1dfe65` ("v3d: use the GPU to record primitives written to transform feedback") Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-13 06:53:26 +00:00
Iago Toral Quiroga	b69f51a5ef	v3d: add missing line break for performance debug message Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-13 06:53:26 +00:00
Eric Engestrom	f812cbfd88	meson/v3d: replace partial list of nir dep files with idep_nir_headers "partial" because `nir_intrinsics_h` was missing. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-09-12 13:18:36 +01:00
Jose Maria Casanova Crespo	068c8889dd	v3d: flag dirty state when binding compute states As introduced in "v3d: flag dirty state when binding new sampler states" we need to add support for compute states. New flag VC5_DIRTY_COMPTEX and VC5_DIRTY_UNCOMPILED_CS are introduced. Reaching 33 flags at the dirty field forces us to change the type to uint_64. Flags are reordered and empty continuous bits are available for future pipeline stages. v2: Update flag conditions to compile cs shader. (Eric Antholt) Now dirty flags use uint_64t and flags are reordered. Added VC5_DIRTY_UNCOMPILED_CS flag. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-12 12:20:17 +01:00
Dave Stevenson	873b092e91	broadcom/v3d: Allow importing linear BOs with arbitrary offset/stride. Equivalent of `0c1dd9dee` "broadcom/vc4: Allow importing linear BOs with arbitrary offset/stride." for v3d. Allows YUV buffers with a single buffer and plane offsets to be passed in. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-30 10:53:05 +02:00
Iago Toral Quiroga	b594796f1b	v3d: do not automatically flush current job for SSBOs and shader images If the current job has a sequence of draw calls involving SSBOs and/or shader images, we would flush the job in between each draw call. With this change, we won't flush the current job and we rely on the application inserting correct barriers by issuing glMemoryBarrier() when needed. v2 (Eric): - When mapping a buffer for writing, we always need to flush. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:25:15 +02:00
Iago Toral Quiroga	f1cf1153e8	v3d: only process glMemoryBarrier() for SSBOs and images PIPE_BARRIER_UPDATE is defined as: PIPE_BARRIER_UPDATE_BUFFER \| PIPE_BARRIER_UPDATE_TEXTURE Which means we were flushing for any flags other than these two, but this was intended to only flush for ssbos and images. Actually, the driver automatically flushes jobs as we need, including writes/reads involving SSBOs and images, so we don't really need to flush anything when the program emits a barrier. However, this may lead to excessive flushing in some cases, so we will soon change this to avoid atutomatic flushing of the current job for SSBOs and images, meaning that we will rely on the application to emit correct memory barriers for these that we should make sure to process here. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:25:15 +02:00
Iago Toral Quiroga	f1559ca922	v3d: fix flushing of SSBOs and shader images If the current draw call includes SSBOs, then we must flush any jobs that are writing to the same SSBOs (so that our SSBOs reads are correct), as well as jobs reading from the same SSBO (so that our SSBO writes don't stomp previous SSBO reads). The exact same logic applies to shader images. In this case we were already flushing previous writes, but we should also flush previous reads. Note that We don't need to call v3d_flush_jobs_reading_resource() and v3d_flush_jobs_writing_resource() separately though, since flushing jobs that read a resource also flushes those writing to it. Suggested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:25:15 +02:00
Iago Toral Quiroga	fb9f7872e7	v3d: handle wait requirement when retrieving query results correctly Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Iago Toral Quiroga	0f2d1dfe65	v3d: use the GPU to record primitives written to transform feedback We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive counts to a buffer, including the number of primives written to transform feedback buffers, which will handle buffer overflow correctly. There are a couple of caveats with this: Primitive counters are reset when we emit a 'Tile Binning Mode Configuration' packet, which can happen in the middle of a primitives query, so we need to read the buffer when we submit a job and accumulate the counts in the context so we don't lose them. We also need to do the same when we switch primitive type during transform feedback so we can compute the correct number of recorded vertices from the number of primitives. This is necessary so we can provide an accurate vertex count for draw from transform feedback. v2: - When computing the number of vertices for a primitive, pass in the base primitive, since that is what the hardware will count. - No need to update primitive counts when switching primitive types if the base primitives are the same. - Log perf warning when mapping the primitive counts BO for readback (Eric). - Only emit the primitive counts packet once at job end (Eric). - Use u_upload mechanism for the primitive counts buffer (Eric). - Use the XML to generate indices into the primitive counters buffer (Eric). Fixes piglit tests: spec/ext_transform_feedback/overflow-edge-cases spec/ext_transform_feedback/query-primitives_written-bufferrange spec/ext_transform_feedback/query-primitives_written-bufferrange-discard spec/ext_transform_feedback/change-size base-shrink spec/ext_transform_feedback/change-size base-grow spec/ext_transform_feedback/change-size offset-shrink spec/ext_transform_feedback/change-size offset-grow spec/ext_transform_feedback/change-size range-shrink spec/ext_transform_feedback/change-size range-grow spec/ext_transform_feedback/intervening-read prims-written Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Iago Toral Quiroga	9eb8699e0f	v3d: be more explicit about the query types supported Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Iago Toral Quiroga	9b316ab57a	v3d: generate packet unpack functions These were not being compiled because of the lack of __gen_unpack_address. v2: - Shift raw address correctly (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Jason Ekstrand	70dc017aec	nir: Stop whacking gl_FrontFacing to a system value We have a cap bit for gallium and a GLSL compiler flag to control this. Just trust what GLSL gives us and stop forcing it. In order for this to be safe, we have to advertise another cap in some of the gallium drivers. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-01 21:59:37 +00:00
Eric Engestrom	c8a453a770	v3d: replace MAYBE_UNUSED with UNUSED MAYBE_UNUSED is going away, so let's replace legitimate uses of it with UNUSED, which the former aliased to so far anyway. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	d470f1acce	v3d: drop incorrect MAYBE_UNUSED While at it, use that `screen` variable everywhere. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Alejandro Piñeiro	cda4c62893	v3d: take into account separate_stencil when checking if stencil should be cleared In most cases this is not needed because the usual is that when a separate stencil is written, the parent resource is also written. This is needed if we have a separate stencil, no depth buffer, and the source and destination is the same, as in that case the stencil can be updated, but not the parent source (like if you are blitting only the stencil buffer). On that situation, the following access to the stencil buffer would clear the stencil buffer (so overwritting the previous blitting) cleared because the parent source has v3d_resource.writes to 0. As far as I see, that situation only happens with the GL_DEPTH32F_STENCIL8 format. Note that one alternative would consider that if the separate_stencil has been written, the parent should also be considered written (and update its "writes" field accordingly). But I found this patch more natural. Fixes the following piglit tests: spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-blit spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-copypixels the latter regressed when internally glCopyPixels implementation started to use blitting. So: Fixes: `131d40cfc9` ("st/mesa: accelerate glCopyPixels(STENCIL)") Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-30 12:05:23 +02:00
Iago Toral Quiroga	1a99fc0fd0	v3d: fix glDrawTransformFeedback{Instanced}() This needs to take the vertex count from the provided transform feedback buffer. v2: - don't take the vertex count from the underlying buffer, instead, take it from a v3d subclass of pipe_stream_output_target (Eric). Fixes piglit tests: spec/ext_transform_feedback2/draw-auto spec/ext_transform_feedback2/draw-auto instanced Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-26 08:29:41 +02:00
Iago Toral Quiroga	47eb74ae00	v3d: subclass pipe_streamout_output_target to record TF vertices written Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-26 08:29:41 +02:00
Iago Toral Quiroga	39df568ca1	v3d: refactor v3d_tf_statistics_record slightly Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-26 08:29:41 +02:00
Pierre-Eric Pelloux-Prayer	e9cf8c1d30	u_blitter: add a msaa parameter to util_blitter_clear Fixes: `ea5b7de138` ("radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled") Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-23 14:42:20 -04:00
Ilia Mirkin	0e30c6b8a7	gallium: switch boolean -> bool at the interface definitions This is a relatively minimal change to adjust all the gallium interfaces to use bool instead of boolean. I tried to avoid making unrelated changes inside of drivers to flip boolean -> bool to reduce the risk of regressions (the compiler will much more easily allow "dirty" values inside a char-based boolean than a C99 _Bool). This has been build-tested on amd64 with: Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d vc4 i915 svga virgl swr panfrost iris lima kmsro Gallium st: mesa xa xvmc xvmc vdpau va Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 22:13:51 -04:00
Iago Toral Quiroga	dacaf7ec06	v3d: fill logicop_func in the fragment shader key when precompiling shaders Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng of logic ops on precompiled shaders, which we don't want to do. Also, this had the side effect of making shader-db crash, as during this lowering we would try to read the color format swizzle information from the fragment shader key that we don't populate in precompiled shaders because right now we only need it when logic operations are enabled. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 08:05:59 +02:00
Andreas Bergmeier	f92290a8d9	broadcom: Move v3d_get_device_info to common In common we can use implementation for Vulkan.	2019-07-17 20:02:34 +00:00
Iago Toral Quiroga	556c299430	v3d: flag dirty state when binding new sampler states We emit code to saturate texture coordinates when using clamp wrapping mode so if we don't flag the dirty state here we don't get to recompile the shaders when the wrapping mode changes. v2: - Do the same when setting sampler views (Eric) - Use a switch statement instead of an if ladder. - Swap the shader stage assertion with an unreachable. Fixes: spec/!opengl 1.1/texwrap 1d bordercolor/gl_rgba8, border color only spec/!opengl 1.1/texwrap 1d proj bordercolor/gl_rgba8, projected, border color only spec/!opengl 1.1/texwrap 2d bordercolor/gl_rgba8, border color only spec/!opengl 1.1/texwrap 2d proj bordercolor/gl_rgba8, projected, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha12, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha16, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha4, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_intensity8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance4_alpha4, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance6_alpha2, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance8_alpha8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_r3_g3_b2, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb10, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb10_a2, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb4, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb5, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb5_a1, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgba4, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgba8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha12, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha16, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha4, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_intensity8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance4_alpha4, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance6_alpha2, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance8_alpha8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_r3_g3_b2, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb10, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb10_a2, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb4, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb5, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb5_a1, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgba4, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgba8, border color only spec/!opengl 1.2/texwrap 3d bordercolor/gl_rgba8, border color only spec/!opengl 1.2/texwrap 3d proj bordercolor/gl_rgba8, projected, border color only spec/arb_es2_compatibility/texwrap formats bordercolor-swizzled/gl_rgb565, swizzled, border color only spec/arb_es2_compatibility/texwrap formats bordercolor/gl_rgb565, border color only spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_alpha, swizzled, border color only spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_luminance_alpha, swizzled, border color only spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_rgb, swizzled, border color only spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_alpha, border color only spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_luminance_alpha, border color only spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_rgb, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_alpha16f_arb, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_intensity16f_arb, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_luminance16f_arb, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_luminance_alpha16f_arb, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_rgb16f, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_rgba16f, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_alpha16f_arb, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_intensity16f_arb, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_luminance16f_arb, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_luminance_alpha16f_arb, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_rgb16f, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_rgba16f, border color only spec/arb_texture_rectangle/texwrap rect bordercolor/gl_rgba8, border color only spec/arb_texture_rectangle/texwrap rect proj bordercolor/gl_rgba8, projected, border color only spec/arb_texture_rg/texwrap formats bordercolor-swizzled/gl_r8, swizzled, border color only spec/arb_texture_rg/texwrap formats bordercolor-swizzled/gl_rg8, swizzled, border color only spec/arb_texture_rg/texwrap formats bordercolor/gl_r8, border color only spec/arb_texture_rg/texwrap formats bordercolor/gl_rg8, border color only spec/arb_texture_rg/texwrap formats-float bordercolor-swizzled/gl_r16f, swizzled, border color only spec/arb_texture_rg/texwrap formats-float bordercolor-swizzled/gl_rg16f, swizzled, border color only spec/arb_texture_rg/texwrap formats-float bordercolor/gl_r16f, border color only spec/arb_texture_rg/texwrap formats-float bordercolor/gl_rg16f, border color only spec/ext_packed_float/texwrap formats bordercolor-swizzled/gl_r11f_g11f_b10f, swizzled, border color only spec/ext_packed_float/texwrap formats bordercolor/gl_r11f_g11f_b10f, border color only spec/ext_texture_shared_exponent/texwrap formats bordercolor-swizzled/gl_rgb9_e5, swizzled, border color only spec/ext_texture_shared_exponent/texwrap formats bordercolor/gl_rgb9_e5, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_alpha8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_intensity8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_luminance8_alpha8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_luminance8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_r8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rg8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rgb8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rgba8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_alpha8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_intensity8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_luminance8_alpha8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_luminance8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_r8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_rg8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_rgb8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_rgba8_snorm, border color only spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_sluminance8, swizzled, border color only spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_sluminance8_alpha8, swizzled, border color only spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_srgb8, swizzled, border color only spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_srgb8_alpha8, swizzled, border color only spec/ext_texture_srgb/texwrap formats bordercolor/gl_sluminance8, border color only spec/ext_texture_srgb/texwrap formats bordercolor/gl_sluminance8_alpha8, border color only spec/ext_texture_srgb/texwrap formats bordercolor/gl_srgb8, border color only spec/ext_texture_srgb/texwrap formats bordercolor/gl_srgb8_alpha8, border color only Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 08:13:28 +02:00
Iago Toral Quiroga	7c1d708911	v3d: acquire scoreboard lock before first tlb read Until now we have always been emitting our scoreboard locks on the last thread switch to improve parallelism. We did this by emitting our last thread switch right before our tlb writes at the very end of the program, where we know that we are outside control flow. Unfortunately, this strategy is not valid when we have tlb color reads too, as these will happen before this point in the program and can happen inside control flow. To fix this we always emit a thread switch before the first tlb load and if we see additional thread switches after that point, we change the strategy to lock on the first thread switch. v2: change the solution so it is expected to work in more scenarios (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	0279ac6e51	v3d: add color formats and swizzles to the fragment shader key We are going to need these very soon to emit correct reads from the tlb to implement logic operations. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Erik Faye-Lund	39e7fbf24a	gallium: get rid of PIPE_CAP_SM3 PIPE_CAP_SM3 has always been an odd one out of all our caps. While most other caps are fine-grained and single-purpose, this cap encode several features in one. And since OpenGL cares more about single features, it'd be nice to get rid of this one. As it turns, this is now relatively simple. We only really care about three features using this cap, and those already got their own caps. So we can remove it, and make sure all current drivers just give the same response to all of them. The only place we really care about SM3 is in nine, and there we can instead just re-construct the information based on the finer-grained caps. This avoids DX9 semantics from needlessly leaking into all of the drivers, most of who doesn't care a whole lot about DX9 specifically. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 15:50:51 +02:00
Alejandro Piñeiro	71446bf8e3	v3d: Early return with handle 0 when getting a bo on the simulator Until now we were just asking entries on the bo hash table, and don't worry if the handle was NULL, as we were just expecting to get a NULL in return. It seems that now the hash table assert with some reserverd pointers, included NULL. This commit just early returns with handle 0. This change fixes several crashes on vk-gl-cts GLES tests when using the v3d simulator, like: KHR-GLES3.core.internalformat.copy_tex_image.* Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-09 08:40:35 +02:00
Iago Toral Quiroga	042aeffd5b	v3d: do not flush jobs that are synced with 'Wait for transform feedback' Generally, we achieve this by skipping the flush on calls to v3d_flush_jobs_writing_resource() when we detect that the resource is written in the current job from a transform feedback write. The exception to this is the case where the caller is about to map the resource, in which case we need to flush immediately since we can only emit 'Wait for transform feedback' commands on rendering jobs. We add a parameter to the function so the caller can identify that scenario. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-02 08:57:20 +02:00
Iago Toral Quiroga	88cbc4f7f6	v3d: emit 'Wait for transform feedback' commands when needed The hardware can flush transform feedback writes before reads in the same job by inserting this command. This patch detects when the rendering state for the current draw call reads resources that had been previously written by transform feedback in the same job and inserts the 'Wait for transform feedback' command before emitting the new draw. v2 (Eric): - this was intended to look at job->tf_write_prscs for TF jobs. - clear job->tf_write_prscs after we emit the TF flush. - can skip flushes for fragment shader reads from TF. v3 (Eric): - all resources in job->tf_write_prscs are resources written by TF so we don't need to check if they are bound to PIPE_BIND_STREAM_OUTPUT. - documented optimization opportunity for geometry stages. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-02 08:57:20 +02:00
Iago Toral Quiroga	c7dff0e614	v3d: keep track of resources written by transform feedback The hardware provides a feature to sync reads from previous transform feedback writes in the same job so if we use this mechanism we no longer have to flush the job. In order to identify this scenario we need a mechanism to identify resources that are written by transform feedback. v2: use _mesa_pointer_set_create (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-02 08:57:20 +02:00
Iago Toral Quiroga	4d8f82946b	v3d: flush jobs writing to vertex buffers used in the current draw call This can happen when any of our vertex buffers was written by a previous transform feedback draw. Fixes the following piglit tests: spec/ext_transform_feedback/position-render-bufferbase spec/ext_transform_feedback/position-render-bufferbase-discard spec/ext_transform_feedback/position-render-bufferoffset spec/ext_transform_feedback/position-render-bufferoffset-discard spec/ext_transform_feedback/position-render-bufferrange spec/ext_transform_feedback/position-render-bufferrange-discard Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-21 08:06:13 +02:00
Iago Toral Quiroga	eb44dcc219	v3d: flush jobs reading from transform feedback output buffers If we are about to write to a transform feedback buffer, we should make sure that we flush any prior work that intended to read from any of these buffers. Fixes piglit test: spec/ext_transform_feedback/immediate-reuse Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-21 08:06:13 +02:00
Iago Toral Quiroga	42572f2f7d	v3d: add a helper to check if transform feedback is enabled v2: We should be safe assuming that bind_vs != NULL (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-21 08:06:13 +02:00
Iago Toral Quiroga	6d97c8fac1	v3d: only flush jobs accessing the query BO when reading query results Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-18 08:09:03 +02:00
Iago Toral Quiroga	5491883a9a	v3d: add a helper function to flush jobs using a BO v2: use _mesa_set_search() (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-18 08:09:03 +02:00
Iago Toral Quiroga	9b96ae69bc	v3d: don't emit point coordinates varyings if the FS doesn't read them We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:29:42 +02:00
Eric Anholt	60a64f028d	v3d: Use driconf to expose non-MSAA texture limits for Xorg. The V3D 4.2 HW has a limit to MSAA texture sizes of 4096. With non-MSAA, we can go up to 7680 (actually probably 8138, but that hasn't been validated by the HW team). Exposing 7680 in X11 will allow dual 4k displays.	2019-05-13 12:03:11 -07:00
Eric Anholt	0c31fe9ee7	gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE. The _LEVELS assumes that the max is always power of two. For V3D 4.2, we can support up to 7680 non-power-of-two MSAA textures, which will let X11 support dual 4k displays on newer hardware. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 12:03:08 -07:00
Eric Anholt	971a13d805	Revert "v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER." This reverts commit `ccce940947`, leaving a note as to why we had to (corruption in chromium, breaking some GLES3.1 tests).	2019-04-26 12:42:30 -07:00
Eric Anholt	49071b2e3f	v3d: Don't try to update the shadow texture for separate stencil. There are two cases where v3d's sampler view's resource doesn't match the base's: shadow textures for sampling from raster, and pointing at the separate depth texture for z32f_s8x24. We only want to update shadow for the first case. Fixes dEQP-GLES31.functional.stencil_texturing.render.depth32f_stencil8_draw when run after the previous testcase.	2019-04-26 12:42:30 -07:00
Eric Anholt	d8486c2ad7	v3d: Use _mesa_hash_table_remove_key() where appropriate.	2019-04-26 12:42:30 -07:00
Eric Anholt	42210a4351	v3d: Apply the GFXH-930 workaround to the case where the VS loads attrs. We were emitting a dummy load for when the VS doesn't load any attributes, but we also need to emit a dummy load for when the render VS loads attributes but the binner VS doesn't. Fixes simulator assertion failures and GPU hangs on KHR-GLES31.core.texture_gather.\*	2019-04-26 12:42:30 -07:00
Eric Anholt	448fc3ea42	v3d: Fill in the ignored segment size fields to appease new simulator. We are assured that the input segment size field is ignored for !separate_segs mode, and now the simulator wants an in-range value set regardless of whether it's functionally ignored or not.	2019-04-26 12:40:31 -07:00
Eric Anholt	d23b47fda5	v3d: Disable SSBOs and atomic counters on vertex shaders. The CTS fails on dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.*vertex when they are enabled, due to the VS being run for both bin and render. I think this behavior is expected to be valid, but I can't find text in atomic counters or SSBO specs saying so (the closed I found was in shader_image_load_store). Just disable it for now, since the closed source driver doesn't expose vertex atomic counters/SSBOs either.	2019-04-24 17:24:11 -07:00
Dylan Baker	95aefc94a9	Delete autotools Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Matt Turner <mattst88@gmail.com>	2019-04-15 13:44:29 -07:00
Eric Anholt	dc402be73e	v3d: Use the new lower_to_scratch implementation for indirects on temps. We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.	2019-04-12 16:16:58 -07:00
Eric Anholt	8a2d91e124	v3d: Detect the correct number of QPUs and use it to fix the spill size. We were missing a * 4 even if the particular hardware matched our assumption.	2019-04-12 15:59:31 -07:00
Eric Anholt	6b1c659825	v3d: Add Compute Shader compilation support. While waiting for the CSD UABI to get reviewed, I keep having to rebase the CS patch. Just land the compiler side for now to keep it from diverging. For now this covers just GLES 3.1 compute shaders, not CL kernels.	2019-04-12 15:59:31 -07:00
Eric Anholt	276ec879fd	v3d: Drop a note for the future about PIPE_CAP_PACKED_UNIFORMS.	2019-04-12 15:58:28 -07:00
Timothy Arceri	035759b61b	nir/i965/freedreno/vc4: add a bindless bool to type size functions This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Eric Anholt	771adffec1	st: Lower uniforms in st in the !PIPE_CAP_PACKED_UNIFORMS case as well. PIPE_CAP_PACKED_UNIFORMS conflates several things: Lowering uniforms i/o at the st level instead of the backend, packing uniforms with no padding at all, and lowering to UBOs. Requiring backends to lower uniforms i/o for !PIPE_CAP_PACKED_UNIFORMS leads to the driver needing to either link against the type size function in mesa/st, or duplicating it in the backend. Given that all backends want this lower-io as far as I can tell, just move it to mesa/st to resolve the link issue and avoid the driver author needing to understand st's uniforms layout. Incidentally, fixes uniform layout failures in nouveau in: dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_vertex and I think in Lima as well. v2: fix indents Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 11:44:20 -07:00
Jason Ekstrand	6279074de1	nir: Get rid of global registers We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Eric Anholt	4c70f276bc	v3d: Don't try to use the TFU blit path if a scissor is enabled. We'll need to do a render-based blit for scissors, since the TFU (as seen in this conditional) can only update a whole surface. Fixes: `976ea90bdc` ("v3d: Add support for using the TFU to do some blits.") Fixes piglit fbo-scissor-blit.	2019-04-04 17:30:35 -07:00
Eric Anholt	62360e92ec	v3d: Bump the maximum texture size to 4k for V3D 4.x. 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: mesa-stable@lists.freedesktop.org	2019-04-04 17:30:35 -07:00
Eric Anholt	e3063a8b2f	v3d: Add support for handling OOM signals from the simulator. I have v3d allocating enough initial allocation memory that we've been passing tests without it, but to match kernel behavior more it would be good to actually exercise the OOM path.	2019-04-04 17:30:35 -07:00
Marek Olšák	66a82ec6f0	gallium: add writable_bitmask parameter into set_shader_buffers to indicate write usage per buffer. This is just a hint (it will be used by radeonsi). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-04 19:28:52 -04:00
Eric Anholt	16f2770eb4	v3d: Upload all of UBO[0] if any indirect load occurs. The idea was that we could skip uploading the constant-indexed uniform data and just upload the uniforms that are variably-indexed. However, since the VS bin and render shaders may have a different set of uniforms used, this meant that we had to upload the UBO for each of them. The first case is generally a fairly small impact (usually the uniform array is the most space, other than a couple of FSes in shader-db), while the second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms instead of 18k. Given that the optimization is of dubious value, has a big downside, and is quite a bit of code, just drop it. No change in shader-db. No change on 3DMMES2 (n=15).	2019-03-21 14:20:50 -07:00
Eric Anholt	320e96bace	v3d: Move constant offsets to UBO addresses into the main uniform stream. We'd end up with the constant offset in the uniform stream anyway, since they're bigger than small immediates. Avoids the extra uniforms and adds in the shader in favor of just adding once on the CPU. shader-db: total instructions in shared programs: 6496865 -> 6494851 (-0.03%) total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)	2019-03-21 14:20:50 -07:00
Eric Anholt	c36d2793ec	v3d: Rename v3d_tmu_config_data to v3d_unit_data. I want to reuse this for encoding small constant UBO/SSBO offsets into the uniform stream to reduce the extra uniform loads and adds for the small constant offsets.	2019-03-21 14:20:50 -07:00
Kenneth Graunke	220c1dce1e	gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits. The glMemoryBarrier() function makes shader memory stores ordered with respect to things specified by the given bits. Until now, st/mesa has ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT, saying that drivers should implicitly perform the needed flushing. This seems like a pretty big assumption to make. Instead, this commit opts to translate them to new PIPE_BARRIER bits, and adjusts existing drivers to continue ignoring them (preserving the current behavior). The i965 driver performs actions on these memory barriers. Shader memory stores go through a "data cache" which is separate from the render cache and other read caches (like the texture cache). All memory barriers need to flush the data cache (to ensure shader memory stores are visible), and possibly invalidate read caches (to ensure stale data is no longer visible). The driver implicitly flushes for most caches, but not for data cache, since ARB_shader_image_load_store introduced MemoryBarrier() precisely to order these explicitly. I would like to follow i965's approach in iris, flushing the data cache on any MemoryBarrier() call, so I need st/mesa to actually call the pipe->memory_barrier() callback. Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on the iris driver. Roland said this looks reasonable to him. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-19 23:43:33 -07:00
Eric Anholt	17115da6ad	v3d: Expose the dma-buf modifiers query. This allows DRI3 to pick between UIF and raster according to whether we're pageflipping or not and whether the pageflipping display can do UIF, avoiding copies for the windowed/composited case that previously was forced to linear. Improves windowed glmark2 -b build:use-vbo=false performance by 30.7783% +/- 13.1719% (n=3)	2019-03-19 08:59:01 -07:00
Eric Anholt	bf6973199d	v3d: Allow the UIF modifier with renderonly. We ask the other side to make a buffer with the right number of pages, and then just store the UIF in it. This avoids an extra silent copy of the buffer from linear to UIF if it gets used for texturing (X11 copy-based swapbuffers, GL compositors).	2019-03-19 08:54:46 -07:00
Eric Anholt	eb5903a908	v3d: Always lay out shared tiled buffers with UIF_TOP set. The samplers are already ready for this, we just needed to make sure that layout chose UIF for level 0.	2019-03-19 08:54:46 -07:00
Alyssa Rosenzweig	cca270bb03	v3d: Use shared drm_find_modifier util Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-14 22:42:51 +00:00
Eric Anholt	486b181fd7	v3d: Fix leak of the renderonly struct on screen destruction. This makes v3d match vc4's destroy path. Fixes: `e113b21cb7` ("v3d: Add renderonly support.")	2019-03-12 16:15:40 -07:00
Eric Anholt	ccce940947	v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER. This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from 11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target devices.	2019-03-12 09:04:25 -07:00
Timur Kristóf	9a834447d6	tgsi_to_nir: Produce optimized NIR for a given pipe_screen. With this patch, tgsi_to_nir will output NIR that is tailored to the given pipe, by reading its capabilities and adjusting the NIR code to those capabilities similarly to how glsl_to_nir works. It also adds an optimization loop that brings the output NIR in line with what glsl_to_nir outputs. This is necessary for the same reason why glsl_to_nir has its own optimization loop: currently not every driver does these optimizations yet. For uses which cannot pass a pipe_screen we also keep a variant called tgsi_to_nir_noscreen which keeps the old behavior. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Acked-By: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Eric Anholt	f1122f78b7	v3d: Fix build of NEON code with Mesa's cflags not targeting NEON. v3d may be built as part of a set of drivers in a system not requiring NEON, but we know V3D devices will be paired with CPUs with NEON so we should be able to use this asm. Fixes: `0c05198d6b` ("v3d: Always enable the NEON utile load/store code.")	2019-03-01 14:21:49 -08:00
Eric Anholt	5a84d46896	v3d: Stop tracking num_inputs for VPM loads. It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.	2019-02-18 18:09:07 -08:00
Eric Anholt	cd5e0b2729	v3d: Use the early_fragment_tests flag for the shader's disable-EZ field. Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: `051a41d3d5` ("v3d: Add support for the early_fragment_tests flag.")	2019-02-18 18:09:06 -08:00
Eric Anholt	332b969c4e	v3d: Sync indirect draws on the last rendering. Fixes intermittent fails in dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices and others (particularly when run as part of a CTS run)	2019-02-18 18:09:06 -08:00
Eric Anholt	32f16b0b1e	v3d: Clear the GMP on initialization of the simulator. Otherwise, we might have pages accessible that shouldn't be and miss out on errors. This is unlikely for most tests since v3d_hw_get_mem() is big enough that it'll be a freshly zeroed mmap, but if screens are destroyed and recreated then we'd be reusing the old v3d_hw_get_mem() contents.	2019-02-18 18:09:06 -08:00
Eric Engestrom	f1374805a8	drm-uapi: use local files, not system libdrm There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-14 11:20:00 +00:00
Karol Herbst	6010d7b8e8	gallium: add PIPE_CAP_MAX_VARYINGS Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <karolherbst@gmail.com> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-07 21:51:45 -05:00
Eric Anholt	bdef17b052	v3d: Store the actual mask of color buffers present in the key. If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).	2019-02-05 15:42:04 -08:00
Eric Anholt	17a649af05	v3d: Fix precompile of FRAG_RESULT_DATA1 and higher outputs. I was just leaving the other MRT targets than DATA0 out, by accident.	2019-02-05 15:35:49 -08:00
Eric Anholt	aaef12702f	nir: Move V3D's "the shader was TGSI, ignore FS output types" flag to NIR. Ken's rework of mesa/st builtins to NIR means that we'll have more NIR shaders with color output types that are mismatched with the render target types. Since this is behavior that GLSL doesn't require, add it as a shader_info option so the driver can know that it needs to ignore the FS output's base type in favor of the actual render target's. This prevents needing additional variants in several mesa/st paths (clear, pbo upload, pbo download), given that the driver already has to handle the variants for any TGSI being passed to it (from u_blitter, for example). Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-05 12:12:33 -08:00
Ernestas Kulik	90458bef54	v3d: Fix leak in resource setup error path Reported by Coverity: in the case of unsupported modifier request, the code does not jump to the “fail” label to destroy the acquired resource. CID: 1435704 Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com> Fixes: `45bb8f2957` ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")	2019-01-29 16:14:13 -08:00
Eric Anholt	0c05198d6b	v3d: Always enable the NEON utile load/store code. I can't imagine the new HW block being paired with a v6 CPU, so don't bother with the CPU detection that vc4 had to do. Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%	2019-01-29 16:00:25 -08:00
Eric Anholt	c496b60ed8	v3d: Create separate sampler states for the various blend formats. The sampler border color is encoded in the TMU's blending format (half floats, 32-bit floats, or integers) and must be clamped to the format's range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't know about how we're abusing the swizzle to support BGRA, A, and LA, so we have to pre-swizzle the border color for those. We don't really want to spend half a kb on sampler states in most cases, so skip generating the variants when the border color is unused or is 0,0,0,0.	2019-01-27 08:30:03 -08:00
Eric Anholt	5fe4250a2c	v3d: Move the sampler state to the long-lived state uploader. Samplers are small (8-24 bytes), so allocating 4k for them is a huge waste.	2019-01-27 08:30:03 -08:00
Eric Anholt	09472006ff	v3d: Use the symbolic names for wrap modes from the XML.	2019-01-27 08:30:03 -08:00
Eric Anholt	c51d125d18	v3d: Fix stencil sampling from a separate-stencil buffer. When the sampler view is in sample-stencil mode, we need to return uint stencil values. To do that, fill in the format table to return R8I, and have the sampler view point at the separate stencil buffer. Fixes dEQP-GLES31.functional.stencil_texturing.format.depth32f_stencil8_2d	2019-01-27 08:30:03 -08:00
Eric Anholt	8a0b0a8f37	v3d: Fix stencil sampling from packed depth/stencil. We need to pick the 8-bit unorm value out, not the depth component.	2019-01-27 08:30:03 -08:00
Eric Anholt	fcdbd441a2	v3d: Fix release-build warning about utile_h.	2019-01-27 08:30:03 -08:00
Eric Anholt	edb1fcd963	v3d: Flush blit jobs immediately after generating them. Fixes OOMs in the CTS's packed_pixels.varied_rectangle.* tests -- the series of texture uploads at the start before texturing occurred would end up all sitting around as cached jobs for reuse. By flushing immediately, peak active BO usage goes from 150M to 40M. We could maybe put some limits on how many jobs we keep around, but blits seem particularly unlikely to get reused for other drawing.	2019-01-27 08:30:03 -08:00
Eric Anholt	ac333ffa59	v3d: Fix BO stats accounting for imported buffers.	2019-01-27 08:30:03 -08:00
Eric Anholt	060575bea8	v3d: Drop maximum number of texture units down to 16. This is the GLES 3.2 minmax, and also what the closed source driver does. Avoids hitting OOMs in the CTS's dEQP-GLES3.functional.texture.units.all_units.only_cube.1.	2019-01-27 08:30:03 -08:00
Eric Anholt	3e743d8cd8	v3d: Avoid duplicating limits defines between gallium and v3d core. We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.	2019-01-27 08:30:03 -08:00
Eric Anholt	533b3f0541	v3d: Rename gallium-local limits defines from VC5 to V3D. The compiler has its limits under V3D_* (like most V3D stuff), so sync up with that.	2019-01-27 08:30:03 -08:00
Karol Herbst	9b24028426	nir: rename nir_var_function to nir_var_function_temp Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Eric Anholt	59527a36e9	v3d: Restructure RO allocations using resource_from_handle. I had bugs in the old path where I was laying out as tiled (so we'd render tiled) but then only allocating space in the shared object for linear rendering. The resource_from_handle makes it so the same layout choices are made in both the import and export scanout cases. Also, fixes a leak of the fd that was tripping up the CTS. Now that we're checking PIPE_BIND_SHARED to choose to use RO, the DRM_FORMAT_MOD_LINEAR check wasn't needed any more. Fixes visual corruption and MMU faults in X in renderonly mode. Fixes: `bd09bb1629` ("v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.")	2019-01-16 16:28:41 -08:00
Eric Anholt	d70eb2302b	v3d: If the modifier is not known on BO import, default to linear for RO. Part of fixing DRI3 rendering with RO on X11. Fixes: `e113b21cb7` ("v3d: Add renderonly support.")	2019-01-16 16:28:41 -08:00
Eric Anholt	bd09bb1629	v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear. We don't have a way to talk to RO about modifiers it can do yet, so assume the minimum.	2019-01-14 15:40:55 -08:00
Eric Anholt	6281f26f06	v3d: Add support for shader_image_load_store. This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).	2019-01-14 15:40:55 -08:00
Eric Anholt	5932c2f0b9	v3d: Add SSBO/atomic counters support. So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.	2019-01-14 15:40:55 -08:00
Eric Anholt	6c8edcb89c	v3d: Drop the GLSL version level. This was an arbitrary "we support lots of stuff" value when I started the driver. However, at 400 we expose OES_gpu_shader5, which claims support for dynamically indexing samplers, which the driver doesn't do yet.	2019-01-14 13:18:02 -08:00
Eric Anholt	49b7e26fac	v3d: Add an isr to the simulator to catch GMP violations. Otherwise, the simulator raises the GMP interrupt and waits for it to be handled, and v3d ends up spinning in v3d_hw_tick(). Aborting right when violation happens gives us a chance to look at the backtrace of whatever thread triggered the violation.	2019-01-14 13:18:02 -08:00
Eric Anholt	619a28b845	v3d: Add support for GL_ARB_framebuffer_no_attachments. Fixes dEQP-GLES31.functional.state_query.integer.max_framebuffer_height_getboolean when GLES3 is enabled.	2019-01-14 13:18:02 -08:00
Eric Anholt	b417a9f7b2	v3d: Add support for flushing dirty TMU data at job end. This will be needed for SSBOs and image_load_store.	2019-01-14 13:18:02 -08:00
Eric Anholt	db3b6b6bca	v3d: Enable GL_ARB_texture_gather on V3D 4.x. This is part of GLES 3.1, and with the NIR lowering we're now passing the GLES31 testcases.	2019-01-08 13:03:44 -08:00
Karol Herbst	d0c6ef2793	nir: rename global/local to private/function memory the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:51:46 +01:00
Eric Anholt	f8e6b364b0	v3d: Fix up VS output setup during precompiles. I noticed that a VS I was debugging was missing all of its output stores -- outputs_written was for POS, VAR0, VAR3, while the shader's variables were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed to be doing here, but we can just walk the declared variables and avoid both this bug and the emission of extra stvpms for less-than-vec4 varyings.	2019-01-04 15:41:23 -08:00
Eric Anholt	d2b899c0ec	v3d: Refactor compiler entrypoints. Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.	2019-01-02 14:12:29 -08:00
Eric Anholt	49d8e2aff1	v3d: Don't forget to include RT writes in precompiles. Looking at some assembly dumps for an optimization, we were clearly missing important parts of the shader!	2019-01-02 14:12:29 -08:00
Eric Anholt	3a81c753a3	v3d: Fix segfault when failing to compile a program. We'll still fail at draw time, but this avoids a regression in shader-db execution once I enable TLB writes in precompiles. Fixes: `b38e4d313f` ("v3d: Create a state uploader for packing our shaders together.")	2019-01-02 14:12:29 -08:00
Eric Anholt	f695d62fe5	v3d: Add support for requesting the sample offsets.	2018-12-30 08:05:11 -08:00
Eric Anholt	696f63f1b4	v3d: Hook up some shader-db output to GL_ARB_debug_output. This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.	2018-12-30 08:03:51 -08:00
Eric Anholt	87b251a940	v3d: Add a "precompile" debug flag for shader-db. I've been using my apitrace-based shader-db so far, but it's slow (apitrace decompression), intrusive (apitrace windows spamming the screen), and doesn't have much coverage. The original shader-db provides a lot more coverage and compiles faster, at the expense of not having the actual runtime variant key. As v3d has a lot less runtime variation than vc4 did, this tradeoff makes more sense.	2018-12-29 13:52:09 -08:00
Eric Anholt	ba36312fbd	v3d: Hook up perf_debug() output to GL_ARB_debug output as well. This is the right channel to report these things, so that end-users don't need to know each driver's custom debug options.	2018-12-20 11:31:19 -08:00
Rhys Kidd	d3991d2472	v3d: Wire up core pipe_debug_callback This lets the driver use pipe_debug_message() for GL_ARB_debug_output. Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-12-20 11:31:16 -08:00
Eric Anholt	d80761b8f3	v3d: Drop shadow comparison state from shader variant key. The shadow state is now in the sampler.	2018-12-20 11:29:30 -08:00
Eric Anholt	0e2758daad	v3d: Fix simulator mode on i915 render nodes. i915 render nodes refuse the dumb ioctls, so the simulator would crash on the original non-apitrace shader-db. Replace them with direct i915 calls if we detect that we're on one of their gem fds.	2018-12-20 11:29:30 -08:00
Eric Anholt	fcfb7f573c	v3d: Load and store aligned utiles all at once. This calls the expensive uif offset function once per utile, but it still gets us a 212.218% +/- 2.41216% (n=10) win on 1024x1024 glTexImage over calling it on each pixel.	2018-12-19 10:27:26 -08:00
Eric Anholt	8ee752194c	v3d: Implement texture_subdata to reduce teximage upload copies. This lets us store the non-PBO glTexImage data directly into the tiled image without making an extra untiled memcpy for the gallium transfer. Improves 1024x1024 TexImage perf by ~19%, mostly from not thrashing around in the kernel mapping and unmapping the transfer's temporary area.	2018-12-19 10:27:26 -08:00
Eric Anholt	e09d8aecb4	v3d: Remove dead prototypes for load/store utile functions.	2018-12-19 10:27:26 -08:00
Eric Anholt	fcf881adda	v3d: Don't try to create shadow tiled temporaries for 1D textures. They're raster order anyway, so we'd assertion fail along with wasting bandwidth. Fixes: `6ad9e8690d` ("v3d: Add support for texturing from linear.")	2018-12-19 10:27:21 -08:00
Eric Anholt	b5adc744ba	v3d: Fix check for TFU job completion in the simulator. We're waiting for the jobs-completed count to increment (with wrapping), not to reach its starting state. This mostly ended up working out because the next v3d_hw_tick() for a submit CL would end up doing the TFU operation first, but it did fail when a blit was used for glReadPixels() at the end of a test. Fixes: `ee0549ff9a` ("v3d: Add the V3D TFU submit interface to the simulator.")	2018-12-19 10:26:04 -08:00
Eric Anholt	365728dc5d	v3d: Put the dst bo first in the list of BOs for TFU calls. In the UAPI, the first BO is the destination, and the one the kernel should do an exclusive reservation on. Currently we only do exclusive reservations, anyway. However, in the simulator path I was only copying back the "destination" BO (actually src in this case), and this caused regressions once I fixed the simulator to actually complete TFU before returning (since otherwise, the TFU op would happen at the start of the next CL submit and the draw would get the right contents). Fixes: `976ea90bdc` ("v3d: Add support for using the TFU to do some blits.")	2018-12-19 10:26:04 -08:00
Eric Anholt	29927e7524	v3d: Drop in a bunch of notes about performance improvement opportunities. These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.	2018-12-14 17:48:01 -08:00
Eric Anholt	a370ed76ab	v3d: Use the uniform pretty-printer in v3d_write_uniforms()'s debug code. This will be a lot easier than my usual "38400.000000? that looks like a viewport scale" decoding strategy.	2018-12-14 17:48:01 -08:00
Eric Anholt	78ef05bde4	v3d: Move uinfo->data[] dereference to the top of v3d_write_uniforms(). Follows `3954331aff` ("vc4: Pull uinfo->data[i] dereference out to the top of the loop.") which showed a large performance win for vc4, but also cleans up the code a decent bit.	2018-12-14 17:48:01 -08:00
Eric Anholt	5b2cc03852	v3d: Add support for draw indirect for GLES3.1. In trying to enable compute shaders, I found that a bunch of deqp-gles31's compute stuff wanted to interact with indirect dispatch. This was easy to do on its own.	2018-12-14 17:48:01 -08:00
Eric Anholt	332a5cf6a5	v3d: Add safety checks for resource_create(). This should ease my debugging next time I screw it up.	2018-12-14 17:48:01 -08:00
Eric Anholt	6ad9e8690d	v3d: Add support for texturing from linear. Just like vc4, we have to support linear shared BOs for X11 on arbitrary displays. When we're faced with a request to texture from one of those, make a shadow image that we copy using the TFU at the start of the draw call.	2018-12-14 17:48:01 -08:00
Eric Anholt	976ea90bdc	v3d: Add support for using the TFU to do some blits. This will be useful in particular for blits from raster to UIF for X11.	2018-12-14 17:48:01 -08:00
Eric Anholt	e5b4d1f55f	v3d: Don't forget to bump the number of writes when doing TFU ops. generatemipmap is just filling out the rest of the mipmap that's already been written (by a mapping or a draw call), so it didn't matter. As I reuse the TFU code for linear-to-UIF conversions, it'll start mattering.	2018-12-14 17:48:01 -08:00
Eric Anholt	485df2574e	v3d: Set up the right stride for raster TFU. I didn't have any raster images in the generatemipmap path, so the pixels-vs-bytes mixup didn't matter here.	2018-12-14 17:48:01 -08:00
Eric Anholt	e731d53716	v3d: Don't forget to wait for our TFU job before rendering from it. Otherwise we may race to read old contents. This didn't show up in the CTS and piglit for me, but it did once I started using the TFU to do linear->UIF blits for X11. Fixes: `2ebca177dc` ("v3d: Use the TFU to do generatemipmap.")	2018-12-14 17:48:01 -08:00
Eric Anholt	cc6a5e937b	shader-packing	2018-12-07 16:51:12 -08:00
Eric Anholt	09ad0d870c	tfu	2018-12-07 16:49:41 -08:00
Eric Anholt	3bd73d31a8	v3d: Fix a leak of the transfer helper on screen destroy. Fixes: `7a30517cce` ("broadcom/vc5: Start adding support for rendering to Z32F_S8X24_UINT.")	2018-12-07 16:48:23 -08:00
Eric Anholt	bad95bb13c	v3d: Add VIR dumping of TMU config p0/p1. I had a bit of it for V3D 3.x, but didn't update it for 4.x.	2018-12-07 16:48:23 -08:00
Eric Anholt	5932575299	v3d: Garbage collect unused uniforms code.	2018-12-07 16:48:23 -08:00
Eric Anholt	62a3192112	v3d: Split most of TEXTURE_SHADER_STATE setup out of sampler views. For shader image load/store, we want most of this logic to be shared.	2018-12-07 16:48:23 -08:00
Eric Anholt	8cb1f3bab7	v3d: Avoid confusing auto-indenting in TEXTURE_SHADER_STATE packing Having "v3dx_pack() {" under each #if branch would confuse emacs's indenter.	2018-12-07 16:48:23 -08:00
Eric Anholt	ee9b758053	v3d: Fix handling of texture first_layer offsets for 3D textures. I think this bug predated adding v3d_layer_offset(). Noticed during an unrelated refactor.	2018-12-07 16:48:23 -08:00
Eric Anholt	acecee4c2d	v3d: Return the right gl_SampleMaskIn[] value. It's supposed to be the dispatched sample mask for this pixel, not the GL state's sample mask.	2018-12-07 16:48:23 -08:00
Eric Anholt	503b55c622	v3d: Don't forget to flush writes to UBOs. If someone did TF into a UBO, we might have left the TF job un-flushed at the point of reading.	2018-12-07 16:48:23 -08:00
Eric Anholt	504d06e4c1	v3d: Make an array for frag/vert texture state in the context. This simplifies a bunch of our texture handling, while introducing the slots necessary for adding new shader stages.	2018-12-07 16:48:23 -08:00
Eric Anholt	e94d034a38	v3d: Put default vertex attribute values into the state uploader as well. The default attributes are long-lived (the state struct is cached), and only 256 bytes each.	2018-12-07 16:48:23 -08:00
Eric Anholt	b38e4d313f	v3d: Create a state uploader for packing our shaders together. Shaders are usually quite short, and are private to the context. We can save memory and reduce the work the kernel needs to do at exec time by packing them together in a stream uploader for long-lived state.	2018-12-07 16:48:23 -08:00
Eric Anholt	1911888760	v3d: Update simulator cache flushing code to match the kernel better. We were missing the invalidate between bin and render (possibly relevant for SSBOs), and still trying to flush the nonexistent L2C on 3.3+.	2018-12-07 16:48:23 -08:00
Eric Anholt	2ebca177dc	v3d: Use the TFU to do generatemipmap. This is a separate, dedicated hardware unit for texture layout conversions and mipmap generation.	2018-12-07 16:48:23 -08:00
Eric Anholt	ee0549ff9a	v3d: Add the V3D TFU submit interface to the simulator. The TFU lets us format raster and SAND images into formats that can be read by the texture engine, and do mipmap generation. The UAPI comes from drm-next e69aa5f9b97f ("Merge tag 'drm-misc-next-2018-12-06' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")	2018-12-07 16:48:23 -08:00
Eric Anholt	42652ea51e	v3d: Use combined input/output segments. The HW apparently has some issues (or at least a much more complicated VCM calculation) with non-combined segments, and the closed source driver also uses combined I/O. Until I get the last CTS failure resolved (which does look plausibly like some VPM stomping), let's use combined I/O too.	2018-12-07 16:48:23 -08:00
Eric Anholt	fb9bcf5602	v3d: Add missing OES_half_float_linear support. We were exposing ARB_texture_float, but apparently not the OES subset flag. Fixes regression from GLES3 support to GLES2. Fixes: `fcf9fcee3c` ("mesa/main: do not require float-texture filtering for es3")	2018-12-07 16:48:23 -08:00
Eric Anholt	90e98295a4	v3d: Add support for RGBA_SRGB along with BGRA_SRGB. This is the actual native format for the hardware, without swizzling. Noticed while debugging why GLES3 disappeared.	2018-12-07 16:48:23 -08:00
Eric Anholt	e113b21cb7	v3d: Add renderonly support. I've been using this with the kmsro series to test v3d on VKMS without my old KMS hack in the v3d kernel driver. KMSRO still needs some cleanup, but v3d RO support seems reasonable.	2018-11-27 15:03:02 -08:00
Eric Anholt	03928dd682	v3d: Fix double-swapping of R/B on V3D 4.1 Fixes: `4018eb04e8` ("v3d: Use the TLB R/B swapping instead of recompiles when available.")	2018-11-15 11:12:54 -08:00
Eric Anholt	f32ba7abd7	v3d: Remove the special path for simulaton of the submit ioctl. Now that it doesn't need to find the struct v3d_bos, it can just take the normal v3d_ioctl() path.	2018-11-02 14:26:38 -07:00
Eric Anholt	df9f574c13	v3d: Maintain a mapping of the GEM buffer in the simulator. This way we don't need to reach back into the gallium driver code to get the mapping.	2018-11-02 14:26:38 -07:00
Eric Anholt	4018eb04e8	v3d: Use the TLB R/B swapping instead of recompiles when available. The recompile reduction is nice, but this also makes it so that a straight texture copy could get optimized some day to not unpack/repack the f16 values.	2018-11-01 13:56:30 -07:00
Eric Anholt	3923cf626d	v3d: Take advantage of _mesa_hash_table_remove_key() in the simulator.	2018-11-01 13:54:36 -07:00
Eric Anholt	47586ab569	v3d: Respect user-passed strides for BO imports. If the caller has passed in a stride for (linear) BO import, we should use that stride when rendering to the BO (or, if we some day support texturing from linear-imported BOs, when doing the linear-to-UIF shadow copy). This lets us remove the extra stride-changing relayout in the simulator.	2018-11-01 13:54:36 -07:00
Eric Anholt	5313fb8abd	v3d: Drop #if 0-ed out v3d_dump_to_file(). This came from vc4, where we had a file format for GPU hangs. I don't have one of those for V3D, and I probably won't ever have the simulator side produce dumps even if I do.	2018-11-01 13:54:36 -07:00
Eric Anholt	d3f66c385b	v3d: Fix a typo in a comment in job handling.	2018-11-01 13:54:36 -07:00
Eric Anholt	b93fc160f4	v3d: Fix a copy-and-paste comment in the simulator code.	2018-11-01 13:54:36 -07:00
Dylan Baker	2fd5dff7e7	util: Move os_misc to util this is needed by u_debug Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Eric Anholt	cc54e1acf9	v3d: Use nir_remove_unused_io_vars to handle binner shader output DCE We were doing this late after nir_lower_io, but we can just reuse the core code. By doing it at this stage, we won't even set up the VS attributes as inputs, reducing our VPM size.	2018-10-30 10:46:52 -07:00
Eric Anholt	17c8198952	v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components. This lets us trim unused trailing components in the vertex attributes, reducing the size of our VPM allocations.	2018-10-30 10:46:52 -07:00
Eric Engestrom	e27902a261	util: use C99 declaration in the for-loop set_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Eric Engestrom	bb84fa146f	util: use C99 declaration in the for-loop hash_table_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Eric Anholt	8ec83dc51e	v3d: Add support for hardware pack/unpack of half floats. Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.	2018-10-15 17:16:44 -07:00
Eric Anholt	dda1ae9b3c	gallium/ttn: Convert inputs and outputs to derefs of variables. This means that TTN shaders more closely resemble GTN shaders: they have inputs and outputs as variable derefs, with the variables having their .driver_location already set up for you. This will be useful for v3d to do input variable DCE in NIR, which we can't do when the TTN shaders never have a pre-nir_lower_io stage. Acked-by: Rob Clark <robdclark@gmail.com>	2018-10-15 17:16:43 -07:00
Kenneth Graunke	38a23517fd	gallium/u_transfer_helper: Add support for separate Z24/S8 as well. u_transfer_helper already had code to handle treating packed Z32_S8 as separate Z32_FLOAT and S8_UINT resources, since some drivers can't handle that interleaved format natively. Other hardware needs depth and stencil as separate resources for all formats. For example, V3D3 needs this for 24-bit depth as well. This patch adds a new flag to lower all depth/stencils formats, and implements support for Z24_UNORM_S8_UINT. (S8_UINT_Z24_UNORM is left as an exercise to the reader, preferably someone who has access to a machine that uses that format.) Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-14 23:36:28 -07:00
Eric Anholt	4e1af6808c	v3d: Switch from FLUSH_ALL_STATE to FLUSH for ending our bin CLs. The HW for FLUSH_ALL_STATE isn't validated, since the closed driver only uses FLUSH. Now that we don't have any new state at the end of our bin CLs, follow their lead.	2018-09-17 16:35:45 -07:00
Eric Anholt	0b8007b523	v3d: Stop clearing the OQ state at the end of the job. Ever since we added OQ support, we've been clearing OQ state at the start of the job anyway. We're intentionally breaking old-and-new-driver-mix systems, because we need to stop using the unvalidated FLUSH_ALL_STATE.	2018-09-17 16:35:45 -07:00
Eric Anholt	350cb79045	v3d: Always emit a TF disable at the start of drawing on V3D 4.x. The HW's FLUSH_ALL_STATE is not validated, so we probably shouldn't use it, meaning that we need to reset state at the start. By doing this, we also make ourselves more resilient to another client leaving the TF state enabled at the end of their batch (as we now do, ourselves). However, we still need to emit a single TF disable at the end of the frame, for SWVC5-718.	2018-09-17 16:35:45 -07:00
Eric Anholt	a91b158bd9	v3d: Fix setup of the VCM cache size. There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: `1561e4984e` ("v3d: Emit the VCM_CACHE_SIZE packet.")	2018-09-07 08:11:38 -07:00
Eric Anholt	f73f748323	v3d: Fix SRC_ALPHA_SATURATE blending for RTs without alpha. Fixes dEQP-GLES3.functional.fragment_ops.blend.default_framebuffer.rgb_func_alpha_func.dst.src_alpha_saturate_src_alpha_saturate and friends with --deqp-egl-config-name=rgb565d0s0 Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-09-07 08:11:05 -07:00
Eric Anholt	492b74b445	v3d: Drop a bunch of duplicated gallium PIPE_CAP default code. Now that we have the util function for the default values, we can get rid of the boilerplate. v2: Rebase on new gallium caps	2018-09-04 08:08:18 -07:00
Eric Anholt	ad782a7020	gallium: Add a helper for implementing PIPE_CAP_* default values. One of the pains of implementing a gallium driver is filling in a million pipe caps you don't know about yet when you're just starting out. One of the pains of working on gallium is copy-and-pasting your new PIPE_CAP into each driver. We can fix both of these by having each driver call into the default helper from their default case, so that both sides can ignore each other until they need to. v2: fix i915g build, revert swr change to avoid breaking scons build (https://travis-ci.org/anholt/mesa/jobs/419739857) v3: Rebase on 3 new gallium caps. Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) Cc: Bruce Cherniak <bruce.cherniak@intel.com> Cc: George Kyriazis <george.kyriazis@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org>	2018-09-04 08:07:52 -07:00
Kenneth Graunke	1281608849	gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE. Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER. Drivers for such hardware would like to advertise support for ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp. This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit, changes the extension enable to be based on that, and enables it in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP (so they continue supporting this mode).	2018-08-24 17:25:36 -07:00
Marek Olšák	d3c1b212bc	gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZE Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	f6ccd594e7	gallium: add PIPE_CAP_MAX_GS_INVOCATIONS Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Eric Anholt	1561e4984e	v3d: Emit the VCM_CACHE_SIZE packet. This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	5d49076990	v3d: Drop "VC5" from the renderer string. VC5 isn't a useful name any more, just stick to v3d.	2018-08-06 13:03:23 -07:00
Marek Olšák	966f155623	gallium: add storage_sample_count parameter into is_format_supported Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-07-31 18:28:41 -04:00
Marek Olšák	0caf74bbcd	gallium: add PIPE_CAP_FRAMEBUFFER_MSAA_CONSTRAINTS Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-07-31 18:28:41 -04:00
Eric Anholt	2df6f1a3df	v3d: Include commands to run the BCL and RCL in CLIF dumps.	2018-07-30 14:29:01 -07:00
Eric Anholt	b56f8c475e	v3d: Rename "configuration" and "config" in the XML to "cfg" This matches what CLIF parsing expects, and makes TILE_BINNING_MODE_CONFIGURATION_COMMON_CONFIGURATION into a much more legible TILE_BINNING_MODE_CFG_COMMON.	2018-07-30 14:29:01 -07:00
Eric Anholt	300e609feb	v3d: s/colour/color in the XML. The CLIF format expects american english spelling, and the rest of Mesa is too. I was previously adhering to the spec's spelling, which is counterproductive.	2018-07-30 14:29:01 -07:00
Eric Anholt	3a8550ad06	v3d: Rename primitives to prims in the XML to match CLIF names. This makes us match up with the V3D HW team's names a bit more.	2018-07-30 14:29:01 -07:00
Eric Anholt	103f21b13d	v3d: Add a separate flag for CLIF ABI output versus human-readable CLs. A few of the upcoming changes would make the V3D_DEBUG=cl output less readable, so let's make proper CLIF file production be under a separate V3D_DEBUG=clif flag.	2018-07-30 14:29:01 -07:00
Eric Anholt	89ac6fa403	v3d: Add pack header support for f187 values. V3D only has one of these (the top 16 bits of a float32) left in its CLs, but VC4 had many more. This gets us proper pretty-printing of the values instead of a large uint.	2018-07-30 14:29:01 -07:00
Eric Anholt	e146e3a795	v3d: Move depth offset packet setup to CSO creation time. This should be some simpler memcpying at draw time, and makes the next change easier.	2018-07-30 14:29:01 -07:00
Eric Anholt	0a3f653180	v3d: Block bin on render when doing vertex texturing. The kernel by default serializes the BCL on previous BCLs submitted on this FD, but not RCLs. For now this fix is conservative and blocks on last RCL if any vertex texturing is done, which fails to get bin/render overlap if there was an intermediate job that doesn't draw to the BCL's buffer. I've dropped a perf_debug() in here to note that as a potential future improvement. Fixes intermittent failures in KHR-GLES3.copy_tex_image_conversions.required.*	2018-07-29 19:25:39 -07:00
Eric Anholt	422910d2e7	v3d: Move clif dumping to a separate step from noting where the CLs are. Now all the printing happens from the same worklist processing.	2018-07-27 17:08:35 -07:00
Eric Anholt	01b4952773	v3d: Move clif dump BO lookup into the clif dumper. The clif dumper is going to need information about all of our BOs if we're going to dump them for replay purposes.	2018-07-27 17:08:35 -07:00
Eric Anholt	22a1ba0403	v3d: Drop the use of the semaphores. The kernel's scheduler doesn't rely on our emitting them, and in fact we'd get in trouble if the kernel decided to schedule too many bins in a row before getting around to scheduling the corresponding render.	2018-07-27 12:56:36 -07:00
Eric Anholt	9bf9a6d6a1	v3d: Drop the VG support from the XML. This reflects a change on the HW/closed SW side to drop this unused HW. With it dropped on their side, the CLIF parser no longer expects to find VG fields.	2018-07-27 12:56:36 -07:00
Eric Anholt	1c8e4632a7	v3d: Stop using spaces in the names of our buffers. For CLIF dumping, we need names to not have spaces. Rather than rewriting them after the fact, just change the two cases where I had put a space in.	2018-07-27 12:56:36 -07:00
Eric Anholt	deecc1ef86	v3d: Avoid the GFXH-1461 workaround if we have only Z or only S. This seems like a sensible precaution to avoid extra draws. It doesn't deal with the case of a Z24S8 buffer created by the window system for an application that happens to never use S.	2018-07-26 11:02:25 -07:00
Eric Anholt	301c32caf4	v3d: Rework the ordering of how we clear things. First, figure out if we can just sneak the clear into the TLB clear, even if drawing has already happened (since we have job->load and job->clear to tell us), taking into account GFXH-1461. For any pieces we can't TLB clear, fall back to drawing a quad without flushing the scene. Fixes extra scene flushes in glmark2 due to GFXH-1461.	2018-07-26 11:02:25 -07:00
Eric Anholt	ceecddfe77	v3d: Only store buffers that have been written to. I've seen cases where a color buffer is bound, but only Z is written, and we end up storing color.	2018-07-26 11:02:25 -07:00
Eric Anholt	d29435e7cb	v3d: Track the buffers being loaded separately. We were computing this at RCL generation time, but that means you can't unflag the store for an invalidate_resource, or not flag the store if writmasking is disabled.	2018-07-26 11:02:20 -07:00
Eric Anholt	47f5d158ae	v3d: Rename cleared/resolve to clear/store. These describe what the fields mean in RCL generation. "resolve" is left over from VC4, and sounds like MSAA resolves (which may or may not be involved in the store we generate).	2018-07-26 11:00:34 -07:00
Eric Anholt	a221f9709e	v3d: Fix incorrect handling of two fences created back-to-back. Recreating our context's syncobj with ALREADY_SIGNALED meant that if you created two fences in a row, then waiting on the second would succeed immediately. Instead, export a sync file in the gallium fence (since we don't have a syncobj clone ioctl), and just create a new syncobj to wait on whenever we need to. Noticed while debugging dEQP-GLES3.functional.fence_sync.client_wait_sync_finish	2018-07-20 11:11:29 -07:00
Eric Anholt	fc28692a5a	v3d: Fix the timeout value passed to drmSyncobjWait(). The API wants an absolute time, so we need to go add gallium's argument to CLOCK_MONOTONIC.	2018-07-20 11:11:29 -07:00
Eric Anholt	4f04bd68cf	v3d: Fix drmSyncobjWait() return value checking even more. It tends to return >0 in the success case (I think the value is something like "how much of the timeout remained"). Fixes dEQP-GLES3.functional.fence_sync.client_wait_sync_finish	2018-07-20 11:11:29 -07:00
Eric Anholt	2f90879a34	v3d: Use the list_first_entry/list_last_entry macros.	2018-07-20 11:11:29 -07:00
Eric Anholt	d0e53373e5	v3d: Move BO cache counting to dump time instead of cache management. This is one less way to get the dump stats wrong.	2018-07-20 11:11:29 -07:00
Eric Anholt	7d6aef6fa5	v3d: Reduce the stale BO reclamation spam with dump_stats set. This was obviously meant to be when we were actually freeing a BO, not just when there was at least one BO in the list.	2018-07-20 11:11:29 -07:00
Eric Anholt	5d11094db1	v3d: Respect a sampler view's first_layer field. Fixes texturing from EGL images created from cubemap faces, as in dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture	2018-07-20 11:11:29 -07:00
Eric Anholt	2c6279d58b	v3d: Fix tiling modifier support to use the new UIF define. You can't use T tiled buffers on V3D 3.x and newer, it's been replaced with a newer layout shared with other hardware blocks.	2018-07-18 10:37:49 -07:00
Eric Anholt	afcc714c98	v3d: Work around GFXH-1461 bug losing our Z/S clears. If you load S and clear Z or vice versa, the clear may get lost. Just fall back to drawing a quad. Fixes KHR-GLES3.packed_depth_stencil.verify_read_pixels.depth24_stencil8	2018-07-13 13:29:29 -07:00
Eric Anholt	e8dc3c0c36	u_blitter: Add an option to draw the triangles using an index buffer. For V3D, the HW will interpolate slightly differently along the shared edge of the trifan. The conformance tests manage to catch this in the nearest_consistency_* group. To get interpolation to match, we need the last vertex of the triangle to be shared. I first tried implementing draw_rectangle to do triangles instead, but that was quite a bit (147 lines) of code duplication from u_blitter, and this seems much simpler and less likely to break as u_blitter changes. Fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* on V3D. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-12 11:49:22 -07:00
Eric Anholt	7714896256	v3d: Don't automatically reallocate a PERSISTENT-mapped buffer. I had mistakenly used the COHERENT flag, which can only be set when PERSISTENT is mapped, but isn't always. Fixes piglit bufferstorage-persistent read	2018-07-12 11:31:08 -07:00
Eric Anholt	e48c615292	v3d: Fix stride of 1D_ARRAY mappings. All of our other texture arrays will be tiled, but 1D is an array of raster mappings and we had the wrong value plugged in here. Fixes piglit getteximage-targets 1D_ARRAY	2018-07-12 11:31:08 -07:00
Eric Anholt	97ddeed949	v3d: Fix MRT blending with independent blending disabled. We were only emitting the RT blend state for RT 0 and only enabling it for RT 0, when the gallium API for !independent_blend is for rt0's state to apply to all of them. Fixes piglit fbo-drawbuffers-blend-add.	2018-07-12 11:31:08 -07:00
Eric Anholt	beeb94402f	v3d: Implement noperspective varyings on V3D 4.x. Fixes a bunch of piglit interpolation tests, and reduces my concern about some MSAA blit shaders with noperspective varyings.	2018-07-09 11:48:32 -07:00
Eric Anholt	4b4795be9d	v3d: Refactor flat shade/centroid flag emission. The logic was duplicated in a pretty gross way, when what we really need is just a helper function for stuffing the values in the packet. This will make implementing noperspective easier.	2018-07-09 11:48:32 -07:00
Eric Anholt	9d0406c52f	v3d: Fix leak of the default attributes BOs. The GLES3 CTS makes a lot more progress on a run now.	2018-07-05 15:50:54 -07:00
Eric Anholt	6b11131373	v3d: Fix leak of the spill BO on context destruction.	2018-07-05 15:50:52 -07:00
Eric Anholt	03f6d26b62	v3d: Skip emitting per-RT blend state for RTs with blend disabled. Cleans up the CL of fbo-drawbuffers2-blend a bit. We could do better on more complicated cases by noticing if multiple RTs have the same blend state and emitting them in a single packet.	2018-07-05 12:39:36 -07:00
Eric Anholt	572f6ab489	v3d: Add proper support for GL_EXT_draw_buffers2's blending enables. I had flagged it as enabled on V3D 4.x, but not actually implemented the per-RT enables. Fixes piglit fbo_drawbuffers2-blend.	2018-07-05 12:39:36 -07:00
Eric Anholt	4819da2301	v3d: Claim PIPE_CAP_TGSI_CAN_READ_OUTPUTS. Fixes warning at screen creation. We store our outputs in normal temps and just emit them to shader I/O at the end, due to our I/O ordering requirements, so reading "outputs" in NIR is fine.	2018-07-02 11:35:41 -07:00
Eric Anholt	49f7631c9f	v3d: Emit a TF flush after each draw using TF. This fixes GPU hangs on 7278 in transform feedback tests such as GTF-GLES3.gtf.GL3Tests.transform_feedback2.transform_feedback2_basic	2018-07-02 10:05:14 -07:00
Eric Anholt	24d2f1347d	v3d: Add missing "number of bin tile lists" field. Noticed when trying to feed our dumps through the CLIF parser. Since this is a "minus one" field, we were already filling in the value we wanted (0).	2018-06-29 13:36:28 -07:00
Eric Anholt	b65b61cefe	v3d: Rewrite the color write masks to match CLIF format. The render_target_* fields gave us pretty(ish) printing, but meant we were incompatible with CLIF, and had much more verbose code generating them.	2018-06-29 13:36:28 -07:00
Marek Olšák	ea8b55b49f	gallium/util: remove dummy function util_format_is_supported Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2018-06-29 15:31:49 -04:00
Eric Anholt	ad1a4cb563	v3d: Fix Z clipping when viewport.scale[2] is negative. Fixes: dEQP-GLES3.functional.shaders.builtin_variable.depth_range_fragment dEQP-GLES3.functional.shaders.builtin_variable.depth_range_vertex	2018-06-27 09:35:51 -07:00
Eric Anholt	9f80bcc2bc	v3d: Convert a bunch of our "minus one" fields over to the new XML attr. This fixes up their formatting for CLIF files and makes the code more legible.	2018-06-27 09:13:48 -07:00
Rob Clark	3d19f116ad	st,ir3,radeonsi: push lower_deref_instrs back into driver vc4+vc5 is not really effected by the deref chain to deref instr conversion, so it no longer needs this pass. For others, now that all the passes mesa/st uses are using deref instructions, push the lowering to deref chains back into driver. Signed-off-by: Rob Clark <robdclark@gmail.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:54:00 -07:00
Jason Ekstrand	74212c2414	anv,i965,radv,st,ir3: Call nir_lower_deref_instrs This inserts a call to nir_lower_deref_instrs at every call site of glsl_to_nir, spirv_to_nir, and prog_to_nir. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:15:54 -07:00
Eric Anholt	69ae42ca4c	v3d: Don't forget to initialize the buffer offset of a new winsys handle.	2018-06-21 15:56:18 -07:00
Eric Anholt	edb7890750	v3d: Fix min vs mag determination when not doing mip filtering. Fixes all 128 failing tests in dEQP-GLES3.functional.texture.filtering.*.combinations	2018-06-20 12:31:54 -07:00
Eric Anholt	94f7c011d6	v3d: Track write reference to the separate stencil buffer. Otherwise, a blit from separate stencil may fail to flush the job that initialized it, or new drawing could fail to flush a blit reading from stencil. Fixes: dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_basic dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_scale dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8 dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8	2018-06-20 09:30:46 -07:00
Eric Anholt	a52c357a65	v3d: Add missing reference to the separate stencil buffer. Noticed while debugging a missing flush of rendering in the z32f_s8 case.	2018-06-20 09:30:46 -07:00
Eric Anholt	1334295f29	v3d: Fix return value from fence_finish. We needed to convert from a -errno to a boolean success value. Fixes: GTF-GLES3.gtf.GL3Tests.sync.sync_functionality_clientwaitsync_flush GTF-GLES3.gtf.GL3Tests.sync.sync_functionality_clientwaitsync_signaled	2018-06-20 09:30:46 -07:00
Christian Gmeiner	f485e5671c	gallium: add scalar isa shader cap v1 -> v2: - nv30 is _NOT_ scalar as suggested by Ilia Mirkin. - Change from a screen cap to a shader cap as suggested by Eric Anholt. - radeonsi is scalar as suggested by Marek Olšák. - Change missing ones to be scalar. v2 -> v3: - r600 prefers vec4 as suggested by Marek Olšák. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-06-20 17:55:39 +02:00
Eric Anholt	da0115b1c3	v3d: Fix blitting from a linear winsys BO. This is the case for the simulator environment, and broke many blitter tests by trying to texture from linear while the HW can only actually do UIF/UBLINEAR/LT. Just make a temporary and copy into it with the CPU, then blit from that. This is the kind of path that should use the TFU, but I haven't exposed that hardware yet. Fixes dEQP-GLES3.functional.fbo.blit.default_framebuffer.*	2018-06-19 09:42:20 -07:00
Eric Anholt	e636199c1c	v3d: Set the SO offsets correctly if we have to re-emit. This should fix TF across a glFlush() or TF pause/restart. Fixes dEQP-GLES3.functional.transform_feedback.array.interleaved.lines.highp_float and many, many others.	2018-06-18 14:54:16 -07:00
Eric Anholt	4106f6ce54	v3d: Handle a no-intersection scissor even if it's outside of the VP. The min/maxes ended up producing a negative clip width/height for dEQP-GLES3.functional.fragment_ops.scissor.outside_render_line. Just make sure they stay at 0 (or v3d 3.x's workaround) if that happens.	2018-06-15 16:09:39 -07:00
Eric Anholt	9aa670e52a	v3d: Use the proper depth texture type for sampling. Fixes failing tests in dEQP-GLES3.functional.texture.shadow	2018-06-15 16:09:39 -07:00
Eric Anholt	e130ada243	v3d: Fix shaders using pixel center W but no varyings. The docs called this field "uses both center W and centroid W", but actually it's "do you need center W even if varyings don't obviously call for it?" Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w	2018-06-15 16:09:39 -07:00
Rhys Perry	51a221e378	gallium: add support for programmable sample locations Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com> (v2) Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)	2018-06-14 20:09:45 -06:00
Eric Anholt	cd2e673abc	v3d: Fix polygon offset for Z16 buffers. Fixes: dEQP-GLES3.functional.polygon_offset.fixed16_displacement_with_units dEQP-GLES3.functional.polygon_offset.fixed16_render_with_units	2018-06-14 17:03:16 -07:00
Eric Anholt	6784aa9870	v3d: Don't set the first_ez_state to DISABLED if after only UNDECIDED draws. We need to have the RCL start with EZ enabled, since those undecided draws had EZ enabled. But we do need to update from UNDECIDED to LT or GT as necessary still. Fixes many simulator assertion fails in deqp fragment_ops/interaction/basic_shader/*	2018-06-14 16:52:25 -07:00
Eric Anholt	9080642449	v3d: Use the right size for v3d 4.x TEXTURE_SHADER_STATE BO. This doesn't really matter, since they both get rounded up to 4096.	2018-06-14 16:52:25 -07:00
Eric Anholt	31548187cf	v3d: Add static asserts for other packed packet sizes.	2018-06-14 16:52:25 -07:00
Eric Anholt	0eef4d7f8f	v3d: Fix the size of the packed attribute state. Fixes segfaults in dEQP-GLES3.functional.vertex_array_objects.all_attributes.	2018-06-14 16:52:25 -07:00
Eric Anholt	7d8fe50af3	v3d: Remove some unused context fields from vc4.	2018-06-14 16:52:25 -07:00
Eric Anholt	48011c42aa	v3d: Remove unused QUNIFORM_STENCIL left over from vc4.	2018-06-14 16:52:25 -07:00
Eric Anholt	4564537222	v3d: Use our #define for max attributes in shader caps.	2018-06-14 16:52:25 -07:00
Eric Anholt	f69473a712	v3d: Work around GFXH-1461/GFXH-1689 by using CLEAR_TILE_BUFFERS. This doesn't seem to have done anything to my test results. However, given that we've still got a class of GPU hangs, following the workarounds that the closed driver does so that we get the same command sequences seems like a good idea.	2018-06-06 13:46:55 -07:00
Eric Anholt	2b1b2cbf61	v3d: Be more explicit about include directory from our generated code. You'd need src/broadcom/cle/ in the -I previously, for srcdir != builddir. nir was fine at that, but automake didn't have it. Bugzilla: https://github.com/anholt/mesa/issues/104	2018-06-05 12:44:49 -07:00
Vinson Lee	d511bba2f9	v3d: Fix automake linking error. CXXLD gallium_dri.la ../../../../src/broadcom/.libs/libbroadcom.a(clif_dump.o): In function `clif_dump_packet': src/broadcom/clif/clif_dump.c:87: undefined reference to `v3d33_clif_dump_packet' src/broadcom/clif/clif_dump.c:85: undefined reference to `v3d41_clif_dump_packet' ../../../../src/broadcom/.libs/libbroadcom.a(clif_dump.o): In function `clif_process_worklist': src/broadcom/clif/clif_dump.c:140: undefined reference to `v3d41_clif_dump_gl_shader_state_record' src/broadcom/clif/clif_dump.c:144: undefined reference to `v3d33_clif_dump_gl_shader_state_record' Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-05-30 11:55:09 -07:00
Marek Olšák	34ea55d820	gallium: add PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-05-29 20:13:24 -04:00
Dave Airlie	b7ac0779e0	gallium/winsys: rename DRM_API_HANDLE_* to WINSYS_HANDLE_* This just renames this as we want to add an shm handle which isn't really drm related. Originally by: Marc-André Lureau <marcandre.lureau@gmail.com> (airlied: I used this sed script instead) This was generated with: git grep -l 'DRM_API_' \| xargs sed -i 's/DRM_API_/WINSYS_/g' Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-05-30 09:11:53 +10:00
Vinson Lee	85f61197df	v3d: Include v3d_drm.h path. Fix build error. CC v3d_blit.lo In file included from v3d_blit.c:27:0: v3d_context.h:39:10: fatal error: v3d_drm.h: No such file or directory #include "v3d_drm.h" ^~~~~~~~~~~ Fixes: `8a793d42f1` ("v3d: Switch the vc5 driver to using the finalized V3D UABI.") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-05-21 11:15:47 -07:00
Eric Anholt	97894b1267	v3d: Add support for glSampleMask / glSampleCoverage.	2018-05-17 15:09:46 +01:00
Eric Anholt	9bbc3f8cf1	v3d: Enable NaN propagation in the VS and CS as well. Fixes piglit vs-isnan-*.shader_test at the expense of gl-1.0-spot-light.	2018-05-17 15:09:12 +01:00
Eric Anholt	b2e7c32703	v3d: Fix wiring filters to NEAREST for 32-bit texture returns. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104626	2018-05-16 21:19:07 +01:00
Eric Anholt	795488d2bf	v3d: Enable the driver by default. Now that we have a stabilized ABI and a fairly conformant driver, turn it on.	2018-05-16 21:19:07 +01:00
Eric Anholt	01ae6a9181	v3d: Rename driver functions from vc5 to v3d. This is the final step of the driver rename.	2018-05-16 21:19:07 +01:00
Eric Anholt	8c47ebbd23	v3d: Rename the driver files from "vc5" to "v3d".	2018-05-16 21:19:07 +01:00

... 4 5 6 7 8 ...

528 Commits