KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Marek Olšák	a0740d59aa	radeonsi: don't invoke DCC decompression in update_all_texture_descriptors This fixes a bug uncovered by the 17-part patch series, specifically: "gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter" If dirty_tex_counter has been updated and set_shader_image invokes DCC decompression, the DCC decompression itself checks the counter and updates descriptors, which in turn invokes the same DCC decompression. The blitter can't handle the recursion and the driver eventually crashes. Cc: 17.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 17:45:29 +01:00
Marek Olšák	f8dd2f5bac	radeonsi: fold info->indirect conditionals into the last one in draw_vbo Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 17:29:36 +01:00
Marek Olšák	408f9a1584	radeonsi: atomize the scratch buffer state The update frequency is very low. Difference: Only account for the size when allocating a new one and when starting a new IB, and check for NULL. (v3) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 17:29:36 +01:00
Bartosz Tomczyk	a41f2527ae	r600: Fix stack overflow Commit `7b5878ee04` increased number of outputs to 64, but left output array intact. This caused stack overflow when number of outputs is bigger then 32. Found by ASAN. Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 15:30:03 +01:00
Samuel Pitoiset	e2c15ea092	gallium/radeon: add new HUD queries for monitoring the CP There are even more counters in the CP_STAT register but I think these ones are enough for now. v2: only read (and expose) CP_STAT on VI+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-30 14:37:00 +01:00
Samuel Pitoiset	0e04a078c5	gallium/radeon: add new GPU-sdma-busy HUD query For simplicity, GPU-sdma-busy will return 0 on previous gens. v2: only read SRBM_STATUS2 on Evergreen+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-30 14:37:00 +01:00
Samuel Pitoiset	b0f7ddef4f	gallium/radeon: rename grbm to mmio in the gpu load path We also want to monitor other MMIO counters like SRBM_STATUS2 in order to know if SDMA is busy. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-30 14:37:00 +01:00
Marek Olšák	2fc5fe0e85	winsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer The time spent in the function dropped by 37% for torcs. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:57:09 +01:00
Samuel Pitoiset	86eb52adad	winsys/amdgpu: do not iterate twice when adding fence dependencies The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush() in the DXMD benchmark. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-30 13:44:25 +01:00
Samuel Pitoiset	5a6b1aadea	winsys/amdgpu: add one likely() call in amdgpu_cs_flush() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-30 13:44:19 +01:00
Samuel Pitoiset	db2b0210b1	hud: fix compilation warnings in hud_nic_graph_install() v2: use PRId64 instead of PRIx64 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-30 13:43:30 +01:00
Samuel Pitoiset	0b646ad05e	st/mesa: make st_texture_get_sampler_view() static Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:42:50 +01:00
Marek Olšák	62732ce263	gallium/radeon: remove r600_common_context::max_db this cleanup is based on the vulkan driver, which seems to do the same thing Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	9327780da6	winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisables This would be a fix if the value was used anywhere. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	80157a2c20	gallium/radeon: clean up r600_query_init_backend_mask This just needs to be done for r600g in the screen. We don't need an IB submission for every new context created for GCN. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	5f99c49008	radeonsi: precompute IA_MULTI_VGT_PARAM values into a table The perf difference is very small: 0.99% -> 0.40% for the time spent in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	c78177fc64	radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	ccecf79c2b	radeonsi: state atom IDs don't have to be off by one Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	ac059f1c23	radeonsi: use a bitmask for looping over dirty PM4 states also move it to draw_vbo, because it should be 0 in most cases Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	802fcdc0d2	radeonsi: atomize L2 prefetches to move the big conditional statement out of draw_vbo Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	c99ba3eb47	radeonsi: unbind disabled shader stages to prevent useless L2 prefetches Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	4a4ff66dbe	radeonsi: also prefetch compute shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	879c73fac8	radeonsi: update dirty_level_mask only after the first draw after FB change Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	cecc068774	gallium/radeon: allow VRAM-only placements again on APUs & recent amdgpu Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	0d0f357de6	radeonsi: don't set +fp64-denormals it's the default and the name will change to +fp64-fp16-denormals. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Marek Olšák	b177162489	radeonsi: remove si_shader_context::param_tess_offchip we don't use on-chip tess. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:27:14 +01:00
Lucas Stach	e158b74971	etnaviv: force vertex buffers through the MMU This fixes a vertex data corruption issue if some of the vertex streams go through the MMU and some don't. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2017-01-30 12:40:57 +01:00
Andres Rodriguez	33f418bd67	radv: Expose VK_KHR_maintenance1 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-01-30 08:44:11 +01:00
Andres Rodriguez	7b890a36df	radv: Fix vkCmdCopyImage for 2d slices into 3d Images Previously the z offset of the destination image was being ignored. It should be taken into account when copying into a 3d target. Also, img_extent_el.depth was being incorrectly clamped to 1 due to the source image being VK_IMAGE_TYPE_2D. This would result in the blit failing to iterate over all the 3d slices. Instead we clamp to the destination image type. Fixes failures in CTS tests: dEQP-VK.api.copy_and_blit.image_to_image.3d_images.* Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-01-30 08:44:07 +01:00
Bas Nieuwenhuizen	4eae3597eb	radv: Expose transfer format features. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-01-30 08:42:26 +01:00
Bas Nieuwenhuizen	34bfe4b1bb	radv: Don't allow any operations on non-supported depth/stencil formats. We really use the depth block for the blits. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-01-30 08:42:26 +01:00
Andres Rodriguez	f8d5e1ab2d	radv: use new error codes for AllocateDescriptorSets There is a new error code in Maintenance1 that is more specific to the situation: VK_ERROR_OUT_OF_POOL_MEMORY_KHR Fixes CTS test case: dEQP-VK.api.descriptor_pool.out_of_pool_memory Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-01-30 08:42:17 +01:00
Andres Rodriguez	e199a993b2	radv: vkAllocateCommandBuffers should NULL all output handles This is part of the spec and fixes CTS tests: dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_* Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-01-30 08:38:13 +01:00
Andres Rodriguez	ec0f5c005c	radv: add trim command pool stub Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-01-30 08:37:54 +01:00
Kenneth Graunke	2f7a7ae131	i965: Support the force_glsl_version driconf option. Gallium drivers have had this for a while. It makes sense to support it consistently across drivers, so expose it in i965 as well. Cc: "17.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-01-29 18:20:57 -08:00
Kenneth Graunke	02216a1ddf	i965: Fix check for negative pitch in can_do_fast_copy_blit(). At this point, the pitch is in bytes. We haven't yet divided the pitch by 4 for tiled surfaces, so abs(pitch) may be larger than 32K. This means the bit 15 trick won't work. The caller now has signed integers anyway, so just pass those through and do the obvious check. Cc: "17.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-01-29 18:20:35 -08:00
Bas Nieuwenhuizen	c4d7b9cd29	radv: Handle command buffers that need scratch memory. v2: Create the descriptor BO with CPU access. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-01-30 02:07:20 +01:00
Bas Nieuwenhuizen	ccff93e138	radv: Track scratch usage across pipelines & command buffers. Based on code written by Dave Airlie. Signed-off-by: Bas Nieuwenhuizen <basni@oogle.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-01-30 02:07:16 +01:00
Bas Nieuwenhuizen	29c1f67e9f	radv/ac: Add compiler support for spilling. Based on code written by Dave Airlie. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-01-30 02:07:12 +01:00
Bas Nieuwenhuizen	d115b67712	radv/amdgpu: Support a preamble CS. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-01-30 02:07:08 +01:00
Timothy Arceri	2842dea310	i965: add assert to while_jumps_before_offset() jip should always be negative here as its the result of do instruction - while instruction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-01-30 10:17:54 +11:00
Timothy Arceri	77a6597bb7	i965: fix up asserts in brw_inst_set_jip() We are casting from a signed 32bit int to an unsigned 16bit int so shift 15 bits rather than 16. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-01-30 10:17:46 +11:00
Bas Nieuwenhuizen	b8ee45ebdc	llvmpipe: Use LLVMDumpModule, not DumpModule. Forgot the prefix ... Fixes: `0fca80b3db` Signed-off-by: Bas Nieuwenhuizen <basni@google.com>	2017-01-29 17:03:25 +01:00
Bas Nieuwenhuizen	0fca80b3db	various: Fix missing DumpModule with recent LLVM. Since LLVM revision 293359 DumpModule gets only implemented when either a debug build or LLVM_ENABLE_DUMP is set. This patch adds a direct replacement for the function for radv and radeonsi, However, as I don't know a good place to put common LLVM code for all three I inlined the implementation for LLVMPipe. v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2017-01-29 10:25:00 +01:00
Ilia Mirkin	ce7a045fee	r600g: use ieee variants of multiplication instructions This matches the behavior of most other drivers, including nouveau, radeonsi, and i965. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-29 00:00:07 -05:00
Ilia Mirkin	bacbb01105	r600g: add support for optionally using non-IEEE mul ops Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-28 23:59:43 -05:00
Eric Anholt	5b7e2697dc	vc4: Coalesce into TLB writes as well as VPM/tex. This generally cuts an instruction when blending is enabled and we thus have a single instruction generating the color value. total instructions in shared programs: 91759 -> 91634 (-0.14%) instructions in affected programs: 5338 -> 5213 (-2.34%)	2017-01-28 19:35:20 -08:00
Eric Anholt	c1299615fb	vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil. shader-db results: total instructions in shared programs: 92611 -> 91764 (-0.91%) instructions in affected programs: 27417 -> 26570 (-3.09%) The star is one shader in glmark2's terrain (drops 16% of its instructions), but there are also wins in mupen64plus and glb2.7.	2017-01-28 19:35:20 -08:00
Eric Anholt	0079df0b2d	vc4: Flip the switch to run the GLSL compiler optimization loop once. This has almost no effect on shader-db: total instructions in shared programs: 92572 -> 92611 (0.04%) instructions in affected programs: 4486 -> 4525 (0.87%) Looking at 2 of the 7 different shaders that were hurt (all of which were in mupen64), they all appear to be just differences in order of instructions at the NIR level. The advantage is that this should significantly reduce time in the compiler.	2017-01-28 19:35:20 -08:00
Kenneth Graunke	7c5629a269	i965: Unbind deleted shaders from brw_context, fixing malloc heisenbug. Applications may delete a shader program, create a new one, and bind it before the next draw. With terrible luck, malloc may randomly return a chunk of memory for the new gl_program that happened to be the exact same pointer as our previously bound gl_program. In this case, our logic to detect new programs in brw_upload_pipeline_state() would break: if (brw->vertex_program != ctx->VertexProgram._Current) { brw->vertex_program = ctx->VertexProgram._Current; brw->ctx.NewDriverState \|= BRW_NEW_VERTEX_PROGRAM; } Because the pointer is the same, we'd think it was the same program. But it could be wildly different - a different stage altogether, different sets of resources, and so on. This causes utter chaos. As unlikely as this seems, I believe I hit this when running a subset of the CTS in a loop, in a group of tests that churns through simple programs, deleting and rebuilding them. Presumably malloc uses a bucketing cache of sorts, and so freeing up a gl_program and allocating a new one fairly quickly causes it to reuse that memory. The result was that brw->vertex_program->info.num_ssbos claimed the program had SSBOs, while brw->vs.base.prog_data.binding_table claimed that there were none. This was crazy, because the binding table is calculated from info.num_ssbos - the shader info appeared to change between shader compile time and draw time. Careful use of watchpoints revealed that it was being clobbered by rzalloc's memset when building an entirely different program... Fortunately, our 0xd0d0d0d0 canary for unused binding table entries caused us to crash out of bounds when trying to upload SSBOs, or we may have never discovered this heisenbug. Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5 at 64x64 in a loop with 100 iterations. Cc: "17.0 13.0 12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-01-27 21:52:37 -08:00

1 2 3 4 5 ...

88617 Commits All Branches Search

88617 Commits

All Branches