KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Kristian H. Kristensen	be9c2ab23b	intel/genxml: Move enums above structs We'll need to define them before we can reference them in structs and instructions. Enums have no dependencies, so move them first in the file. Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-29 22:02:49 -08:00
Kristian H. Kristensen	ce26486115	genxml: Add values for Barycentric Interpolation Mode Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-29 22:02:49 -08:00
Ilia Mirkin	ed0b3cbd09	anv: remove per-sample shading from TODO This was done some time ago. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2016-11-30 00:17:56 -05:00
Ilia Mirkin	be92b3f49d	anv: clean up VkPhysicalDeviceFeatures list Remove duplicate .alphaToOne, add missing .shaderResourceMinLod, and reorder a few entries to match their vulkan.h order. All the sparse features are still left out entirely. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2016-11-30 00:17:56 -05:00
Michel Dänzer	550cd272b4	vulkan/wsi/x11: Destroy Present event context when destroying swapchain Without this, the X server may accumulate stale Present event contexts if a client creates and destroys multiple swapchains using the same window. v2: Based on Chris Wilson's review: * Use xcb_present_select_input_checked so that protocol errors generated by old X servers can be handled gracefully * Use xcb_discard_reply() instead of free(xcb_request_check()) v3: Rebased on top of this code having been refactored out of anv Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-11-30 12:31:25 +09:00
Timothy Arceri	2ea021a1eb	glsl: use linked_shaders bitmask to iterate stages for subroutine fields This should be faster than looping over every stage and null checking, but will also make the code a bit cleaner when we switch to getting more fields from gl_program rather than from gl_linked_shader as we can just copy the pointer and not need to worry about null checking then copying. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-11-30 14:13:52 +11:00
Timothy Arceri	6d3458cbfb	mesa: optimise interleaved sso validation Now that we have a linked_stages bitfield we can use this to check if the program is used at a later stage. This change is also required to be able to use gl_program rather than gl_shader_program in the CurrentProgram array. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-11-30 14:13:52 +11:00
Timothy Arceri	34953f8907	mesa/glsl: add bitmask to track stages a program was linked against This will be used to enable us to store the current gl_program rather than gl_shader_program in the gl_pipline_object allowing us to simplify handing of validation. Also we should not be depending on _LinkedShader for this information as it may contain shaders from a failed linking attempt rather than the current program still in use. We could also use this mask to iterate over the stages during linking with _mesa_bit_scan() rather then the current method of NULL checking each stage. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-11-30 14:13:52 +11:00
Ilia Mirkin	ddf0f097e7	swr: [rasterizer jit] use signed integer representation for logic op Instead of (incorrectly) biasing the snorm value to make it look like a unorm, just use signed integer math. This fixes arb_color_buffer_float-render GL_RGBA8_SNORM Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-29 20:55:00 -05:00
Ilia Mirkin	8ed703cfa6	swr: add missing rgbx8_srgb variant Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-29 20:54:57 -05:00
Ilia Mirkin	d6a06228a6	swr: reorder renderable formats, add grouping comments Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-29 20:54:54 -05:00
Ilia Mirkin	53ca06be8f	swr: use util_copy_framebuffer_state helper Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-29 20:54:50 -05:00
Ilia Mirkin	86f7932b1e	swr: enable cubemap arrays Everything is in place for these. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-29 20:54:46 -05:00
Ilia Mirkin	8dd9853516	swr: rearrange caps into limits/supported/unsupported groups I find this a lot more readable and compact - much easier to scan through the list and see what's on and what's off. No functional change intended. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-29 20:54:43 -05:00
Ilia Mirkin	9f568e5db1	swr: only store up to the LOD size Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-29 20:54:36 -05:00
Tim Rowley	f7ab0e4b7e	swr: [rasterizer common] add SwrTrace() and macros Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-29 19:36:46 -06:00
Marek Olšák	662b9c24d0	radeonsi: don't fetch 8 dwords for samplerBuffer and imageBuffer The compiler doesn't shrink s_load_dwordx8, so we always wasted 4 SGPRs. Also, the extraction of the descriptor created some really ugly asm code with lots of VALU bitwise ops and v_readfirstlane. Totals from affected shaders: SGPRS: 13880 -> 13253 (-4.52 %) VGPRS: 15200 -> 15088 (-0.74 %) Code Size: 499864 -> 459816 (-8.01 %) bytes Max Waves: 1554 -> 1564 (0.64 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	dbbdc6bb5a	radeonsi: disable XNACK to free 2 SGPRs on APUs My LLVM commit disables it for dGPUs, but not APUs. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	274fb601c2	radeonsi: count and report temp arrays in scratch separately v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2016-11-29 23:52:31 +01:00
Marek Olšák	a91add9369	radeonsi: don't try to eliminate trivial VS outputs for PS and CS PS and CS don't have any param exports, so it's a no-op. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	5e5573b1bf	radeonsi: disable RB+ blend optimizations for dual source blending This fixes dual source blending on Stoney. The fix was copied from Vulkan. The problem was discovered during internal testing. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	ff50c44a5f	radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blending copied from Vulkan Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	87b208a54e	radeonsi: always set all blend registers better safe than sorry Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	fc9f7fc9d0	radeonsi: set the smallest possible CB_TARGET_MASK better safe than sorry; set_framebuffer_state always makes this dirty Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	ea43d0b5e8	radeonsi: don't print bodies of header-only packets Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	7abd94c9b0	radeonsi: print unknown registers with correct formatting Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Marek Olšák	9e1dc10432	ddebug: fix hang detection with deferred flushes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-29 23:52:31 +01:00
Dave Airlie	048143b9d9	radv: set spi_baryc_cntl.pos_float_location to 0 This fixes: dEQP-VK.pipeline.multisample_interpolation.offset_interpolate_at_sample_position.* This should probably be 2 when sample shading is enabled, but I'm not sure. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-11-29 22:48:23 +00:00
Dave Airlie	f3a3fea973	radv: force persample shading when required. We need to force persample shading when a) shader uses sample_id b) shader uses sample_position c) shader uses sample qualifier. Also since ps_iter_samples can now change independently of the rasterizer samples we need to move setting the regs more often. This fixes: dEQP-VK.pipeline.multisample_interpolation.centroid_interpolate_at_consistency.* dEQP-VK.pipeline.multisample_interpolation.centroid_qualifier_inside_primitive.137_191_1.* dEQP-VK.pipeline.multisample_interpolation.sample_interpolate_at_distinct_values.* dEQP-VK.pipeline.multisample_interpolation.sample_qualifier_distinct_values.128_128_1.* Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-11-29 22:48:03 +00:00
Dave Airlie	6a62026dd4	nir: print var binding in dumps. This only useful for spir-v shaders, but I keep finding myself having to add it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-11-29 22:07:13 +00:00
Eric Engestrom	fae5e1dc74	docs: fix small typo Fixes: `ba28f2136f` ("docs: add note about r-b/other tags when resending") Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-29 22:02:57 +00:00
Matt Turner	218fec66cc	i965/sched: Schedule trivial blocks. In commit `45cd76e342` schedule_instructions(bblock_t *) began setting bblock_t::cycle_count, but that function was not called on trivial blocks. Remove the code to skip trivial blocks so that cycle_count is set. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-11-29 11:53:36 -08:00
Matt Turner	cab0952d4b	i965/sched: Make 'time' a local variable. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-11-29 11:53:36 -08:00
Matt Turner	b0156702fa	i965/cfg: Initialize bblock_t::cycle_count. schedule_instructions(bblock_t *) isn't called on blocks with a single instruction, and since it is the only thing that set cycle_count, cycle_count would be uninitialized. A non-empty block with bblock_t::cycle_count == 0 is arguably a bug. That'll be fixed in the next commit. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-11-29 11:53:36 -08:00
Matt Turner	ca9e30e002	i965/cfg: Initialize cfg_t::cycle_count. This reverts commit `b4001af174`. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-11-29 11:53:36 -08:00
Bas Nieuwenhuizen	b8c9ce4459	ac/nir: Fix accessing an unitialized value. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-11-29 20:13:28 +01:00
Bas Nieuwenhuizen	029e8ff81c	radv: Initialize the shader_stats_dump flag. Meta was using it before it was set. I suspect we typically don't want to dump meta shaders, so just set it to false in the beginning. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-11-29 20:13:28 +01:00
Eric Anholt	d40a3212ae	vc4: Add a note for the future about texture latency calculation. Debugging a shader-db reported cycle count regression from the tex coalescing, I eventually figured out that the texture latencies were totally bogus. Really fixing it will probably involve mirroring vc4_qir_schedule.c's texture fifo management here.	2016-11-29 09:01:23 -08:00
Eric Anholt	4690a93b12	vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs. This isn't as complete as I would like (can't merge interpolation because of the implicit r5 dependency, doesn't work with control flow), but this was cheap and easy. Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16) total instructions in shared programs: 99810 -> 99059 (-0.75%) instructions in affected programs: 10705 -> 9954 (-7.02%)	2016-11-29 08:52:50 -08:00
Eric Anholt	f4baf80993	vc4: Restructure VPM write optimization into two passes. For texturing, there won't be a fixed limit on how many writes there are, so we need to compute uses up front.	2016-11-29 08:38:59 -08:00
Eric Anholt	a025983dd9	vc4: Make qir_for_each_inst_inorder() safe against removal. The dead code elimination wants it to be safe, and I actually got segfaults due to it being unsafe with the new coalescing pass.	2016-11-29 08:38:59 -08:00
Eric Anholt	27544ea8d3	vc4: Split optimizing VPM writes from VPM reads. The VPM write logic will be basically the same as the texture coordinate write logic we need, and it's not really related to the VPM read logic other than the reuse of the use_count array.	2016-11-29 08:38:59 -08:00
Eric Anholt	d4c20e82ae	vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst. For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.	2016-11-29 08:38:59 -08:00
Eric Anholt	314f0c57e4	vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *). Every caller was dereffing the qinst, and this will let us make the number of sources vary depending on the destination of the qinst so that we can have general ALU ops that store to tex_[strb] and get an implicit uniform.	2016-11-29 08:38:59 -08:00
Eric Anholt	51087327f2	vc4: Replace the qinst src[] with a fixed-size array. This may have made a tiny bit of sense when we had one 4-arg inst per shader, but if we only ever put 2 things in, having a pointer to 2 things almost every instruction is pointless indirection.	2016-11-29 08:38:59 -08:00
Eric Anholt	a220f1b5a9	vc4: Remove qir_inst4(). This was used originally for unorm4x8 packs, but we now represent those as a series of packed movs.	2016-11-29 08:38:59 -08:00
Ilia Mirkin	7a8def8c18	anv: bump the texture gather offset limits This matches what NVIDIA and AMD hardware expose, as well as what Intel hardware supports. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-29 07:44:01 -08:00
Ilia Mirkin	62b8dbf35e	i965/gen7: expose larger gather offsets This matches the capabilities of the hardware. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-29 07:44:01 -08:00
Ilia Mirkin	4f2d1d6ea7	i965: support constant gather offsets larger than 4 bits Offsets that don't fit into 4 bits need to force gather_po to be selected. Adjust the logic so that this happens. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-29 07:44:01 -08:00
Jason Ekstrand	faf20df143	i965/fs: Refactor handling of constant tg4 offsets Previously, we had an OFFSET_VALUE source for logical texture instructions that was intended to mean exactly what it says, "offset". In reality, we only fully used it for tg4 offsets. We used offset_value.file == IMM to mean, "you have a constant offset, go look in instr->offset" and didn't actually use the contents of the register at all in that case except for in nir_emit_texture where we used it as a temporary before we copy it into instr->offset. This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to indirect tg4 offsets only. The nir_emit_texture code is refactored so that we explicitly build a header_bits value which is placed in instr->offset and the constant offset values (both for tg4 and regular texture operations) are used to construct header_bits and don't go through the offset source at all. Finally, we stop passing offset_value in to lower_sampler_logical_send_gen5 because we can't do indirect offsets until gen7 anyway. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-11-29 07:44:01 -08:00

1 2 3 4 5 ...

87033 Commits All Branches Search

87033 Commits

All Branches