KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Karol Herbst	2402232c90	spirv: handle UniformConstant for OpenCL kernels The caller is responsible for setting up the ubo_addr_format value as contrary to shared and global, it's not controlled by the spirv. Right now clovers implementation of CL constant memory uses a 24/8 bit format to encode the buffer index and offset, but that code is dead as all backends treat constants as global memory to workaround annoying issues within OpenCL. Maybe that will change, maybe not. But just in case somebody wants to look at it, add a toggle for this inside vtn. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-12-11 23:54:39 +00:00
Dave Airlie	123f90cf36	gallivm/nir: copy compare ordering code from tgsi This fixes some isinf/isnan tests copying what the tgsi code paths do for float compares Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-12-12 09:16:41 +10:00
Dave Airlie	8f56ba5da4	gallivm/nir: cleanup code and call cmp wrapper Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-12-12 09:16:37 +10:00
Dave Airlie	63b3d38a50	gallivm: fix perspective enable if usage_mask doesn't have 0 bit set The current code looks like a typo, and fails if the usage_mask is for a y/z enabled input. Fixes piglit ext_transform_feedback-immediate-reuse-index-buffer with llvmpipe/nir Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-12-12 09:16:33 +10:00
Dave Airlie	bf29040103	gallivm: fix transpose for when first channel isn't created The previous fix worked when the second channel wasn't exposed, but a couple of piglit tests have inputs with just the y/z chans, no x/w. Partly Fixes piglit ext_transform_feedback-immediate-reuse-index-buffer with llvmpipe/nir Fixes: `5363cda52b` ("gallivm: add swizzle support where one channel isn't defined.") Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-12-12 09:16:28 +10:00
Dave Airlie	e35b2c37cd	llvmpipe/nir: handle texcoord requirements Switch to using texcoord intrinsic support. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-12-12 09:16:24 +10:00
Kristian H. Kristensen	b6f8c42846	freedreno/a6xx: Silence warning for unused perf counters Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	9b09776846	freedreno/a6xx: Convert some tile setup to OUT_REG() Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	8a4b0d852c	freedreno/a6xx: Convert gmem blits to OUT_REG() Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	201caa7281	freedreno/a6xx: Convert VSC pipe setup to OUT_REG() Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	c71348f84a	freedreno/a6xx: Convert emit_zs() to OUT_REG() Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	ffa7d9cbeb	freedreno/a6xx: Convert emit_mrt() to OUT_REG() Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	781b2dd63b	freedreno/a6xx: Include fd6_pack.h in a few files Including non-functional changes to get the value from the fd_reg_pair in places. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	9783f6bc5d	freedreno/a6xx: Drop stale include Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	9b05466144	freedreno/registers: Add 64 bit address registers Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	bdd98b892f	freedreno: New struct packing macros Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Kristian H. Kristensen	b27b0e8550	freedreno/registers: Remove duplicate register definitions Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 22:25:47 +00:00
Timothy Arceri	f8148d0cc1	docs: remove mailing list as way of submitting patches All developers now use gitlab, don't confuse newcomers by suggesting they might use the mailing list. We want everyone to use gitlab so that patches get run through basic CI before they are merged. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-12-12 09:09:50 +11:00
Jason Ekstrand	776cfde699	anv: Bump the advertised patch version to 129 We've been keeping up with the spec updates. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-11 18:52:08 +00:00
Jason Ekstrand	5f5f5019bd	anv: Unconditionally advertise Vulkan 1.1 Vulkan 1.1 requires VK_KHR_external_fence which requires syncobj support to be actually usable. However, it doesn't strictly require that we support any external handle types. We should be able to advertise 1.1 even on old kernels that don't have syncobj support. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-11 18:52:08 +00:00
Jason Ekstrand	98a83d0fce	anv: Flush the queue on DeviceWaitIdle When we have syncobj_wait, we can trust in WAIT_FOR_SUBMIT but when we don't, we only have BO waits and those aren't quite as nice. This commit adds a flag to _anv_queue_submit to wait for the queue to drain before returning. This gives us the behavior we need to implement DeviceWaitIdle. Fixes: `246261f0ad` "anv: prepare the driver for delayed submissions" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-11 18:52:08 +00:00
Karol Herbst	0bafde717d	nir/tests: MSVC build fix Fixes: `11f736a6f9` "nir/tests: add serializer tests" Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-12-11 17:12:48 +00:00
Jan Zielinski	ab55708200	swr/rasterizer: Add tessellator implementation to the rasterizer This is initial commit on the way to implement ARB_tessellation_shader extension in OpenSWR. It introduces tessellator implementation taken from Microsoft GitHub (published under MIT license): https://github.com/microsoft/DirectX-Specs/blob/master/d3d/archive/images/d3d11/tessellator.cpp https://github.com/microsoft/DirectX-Specs/blob/master/d3d/archive/images/d3d11/tessellator.hpp It also adds some glue code that connects the tessellator to the internals of SWR rasterizer. Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Bruce Cherniak <bruce.cherniak@intel.com> Reviwed-by: Alok Hota <alok.hota@intel.com>	2019-12-11 16:54:37 +00:00
Samuel Pitoiset	ff2e11b210	gitlab-ci: set RADV_DEBUG=checkir for RADV test jobs This is used to validate if the driver emits correct LLVM IR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-12-11 15:44:40 +00:00
Eric Engestrom	b2dac806f8	intel: add mi_builder_test for gen12 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-11 15:38:19 +00:00
Rohan Garg	2129b4152c	gitlab-ci: Use lavacli from packages lavacli 0.9.8 is now available in Debian Testing. Ref: https://tracker.debian.org/news/1066828/lavacli-098-1-migrated-to-testing/ Fixes: `555c0de` ("gitlab-ci: Move LAVA-related files into top-level ci dir") Signed-off-by: Rohan Garg <rohan.garg@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-12-11 15:19:43 +00:00
Erico Nunes	7701b7b7ee	lima/ppir: enable lower_fdph Otherwise we may lower some fdot to fdph which is not implemented in pp. Fixes #2126 Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-12-11 15:55:48 +01:00
Karol Herbst	11f736a6f9	nir/tests: add serializer tests Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-12-11 13:00:44 +01:00
Karol Herbst	676232d76f	nir/serialize: fix vec8 and vec16 Nir serializes uses nir_ssa_alu_instr_src_components in a few places to determine how many components a src has, but that's not what this function returns. It simply returns how many channels are used, which is still fine for most of the code. This was breaking code like this: vec16 32 ssa_1 = intrinsic load_global vec1 32 ssa_2 = fmax ssa_1.a, ssa_2.b v2: make the 16bit encoding work for identify swizzles again Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-12-11 13:00:44 +01:00
Bas Nieuwenhuizen	2e44bfc14f	radv: Fix RGBX Android<->Vulkan format correspondence. This is correct per the Vulkan spec format equivalence table. Fixes: `f36b52740a` "radv/android: Add android hardware buffer queries." Reviewed-by: Eric Anholt <eric@anholt.net>	2019-12-11 11:40:13 +01:00
Tomeu Vizoso	63ae9e61c1	panfrost: Add PAN_MESA_DEBUG=sync Sometimes it's useful to get information about GPU faults in the console, so it's synchronized with other messages. This commit will cause Mesa to wait for completion and check if there are any faults raised by the GPU. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-12-11 08:01:20 +01:00
Kenneth Graunke	2e654db27a	iris: Create smaller program keys without legacy features A lot of the brw_*_prog_key fields are for emulating features on legacy hardware that iris doesn't support. In particular, all of the texture swizzle fields take up a lot of space. These dead fields make hashing the shader keys more expensive than it ought to be. We introduce iris-specific keys with only the information we need, and translate them to brw keys when actually compiling new variants. This way, key comparisons can use the small keys. The size reductions are: VS: 328 bytes -> 8 bytes TCS: 312 bytes -> 24 bytes TES: 304 bytes -> 24 bytes GS: 284 bytes -> 8 bytes FS: 304 bytes -> 16 bytes CS: 280 bytes -> 4 bytes Scores for the Piglit drawoverhead microbenchmark case with a shader program change improve by roughly 30%. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-12-10 22:25:41 -08:00
Pierre Moreau	8ccd3f48a0	compiler/spirv: Fix uses of gnu struct = {} extension Fixes: `a24d6fbae6` ("meson: Add -Werror=gnu-empty-initializer to MSVC compat args") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Vinson Lee <vlee@freedesktop.org> Signed-off-by: Pierre Moreau <dev@pmoreau.org>	2019-12-11 06:03:22 +00:00
Vinson Lee	9661fc9cdb	util/u_thread: Restrict u_thread_get_time_nano on macOS. macOS does not have pthread_getcpuclockid. src/util/u_thread.h:156:4: error: implicit declaration of function 'pthread_getcpuclockid' is invalid in C99 [-Werror,-Wimplicit-function-declaration] pthread_getcpuclockid(thread, &cid); ^ Fixes: `4913215d14` ("util/u_thread: don't restrict u_thread_get_time_nano() to __linux__") Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2171 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-12-10 21:35:47 -08:00
Eric Anholt	8bf590b46b	tu: Move UBWC layout into fdl6_layout() and use that function. This gets us shared non-UBWC layout code between gallium and turnip. Until I fix up the rest of gallium to handle UBWC mipmapping, we do the single-level UBWC setup in gallium as a fixup after layout. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 04:24:18 +00:00
Eric Anholt	de619d7503	freedreno: Switch the 16-bit workaround to match what turnip does. Prevents regressions on argb1555 and rgb565 when making turnip use freedreno's layout. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 04:24:18 +00:00
Eric Anholt	d9cf3e76bd	freedreno: Move a6xx's setup_slices() to a shareable helper function. We pass in all the parameters for setting up the layout, though freedreno still sets a few of them up early (since it uses layout helpers in making some decisions about the layout setup parameters that will be cleaned up once krh's blitter work lands).	2019-12-11 04:24:18 +00:00
Eric Anholt	67258a44d2	tu: Move our image layout into a freedreno_layout struct. This lets us start using some of the fdl_* helpers and have more obviously matching code between gallium and turnip. We can't yet use the fdl_* UBWC helpers, since the gallium driver doesn't do UBWC mipmaps (which I'm working on in another branch). Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 04:24:18 +00:00
Eric Anholt	ea7631a9a6	freedreno: Move UBWC layout into a slices array like the non-UBWC slices. This is a little refactor in preparation for UBWC mipmapping support. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 04:24:18 +00:00
Eric Anholt	bbe84c6c31	freedreno: Refactor the UBWC flags registers emission. It's the same logic for each of these being emitted, and I was about to change the rsc->layout.* for UBWC. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 04:24:18 +00:00
Eric Anholt	97be9503bb	freedreno: Drop the extra offset field for mipmap slices. We can just bake the UBWC-goes-first delta into the slices at setup time. I did have to fix up the resource shadowing swap path to swap the slice fields, as it was missing and regressed the format reinterpets otherwise. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-12-11 04:24:18 +00:00
Kenneth Graunke	69d7782b15	intel/decoder: Make get_state_size take a full 64-bit address and a base i965 wants to use an offset from a base because everything is in a single buffer whose address may be relocated, and all base addresses are set to the start of that buffer. iris wants to use a full 64-bit address, because state lives in separate buffers which may be in the shader, surface, and dynamic memory zones, where addresses grow downward from the top of a 4GB zone, So it's very possible for a 32-bit offset to exist relative to multiple bases, leading to the wrong state size.	2019-12-10 19:10:49 -08:00
Dongwon Kim	8a8534a698	iris: INTEL performance query implementation low-level implementation of INTEL-performance-query APIs in Intel iris driver. Most of functions and procedures defined here are adopted from i965 driver (brw_performance_query.c) v2: - replace genX_init_performance_query with iris_init_perfquery_functions which is gen's version agnositic - general code clean-up v3: include gen_perf_gens.h as some of defines were moved to this new header file v4: - checking for kernel 4.13+ won't be needed here as Iris won't be loaded anyway without DRM_SYNCOBJ that is enabled after Kernel 4.13. - checking whether gen < 8 or is_cherryview won't be required as well because those cases are screened in iris_screen_create. v5: remove genX(init_performance_query) v6: - remove oa_metrics_kernel_support as iris works only with kernel 4.18 and newer. - use perf functions defined in separate file, iris_perf.h/c Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-10 17:02:58 -08:00
Mark Janes	ca2dd99bf6	iris: separating out common perf code The configuration of the gen_perf vtable will be the same for INTEL_performance_query and AMD_performance_monitor. Initialize the table in a single routine that can be called from both implementations. Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-10 17:02:58 -08:00
Dongwon Kim	106054ef79	gallium: enable INTEL_PERFORMANCE_QUERY new state tracker APIs added for INTEL_performance_query This extension is enabled if all vendor specific functions for it exist. v2: add st_cb_perfquery.* to the list of sources in Makefile v3: minor code clean-up v4: - add driver hooks for intel-performance-query apis - add PIPE level performance counter and type enums that match to OpenGL enums - do conversion of pipe_perf_counter_type and pipe_perf_counter_data_type enums to GL defines in state_tracker Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-10 17:02:58 -08:00
Dylan Baker	d0eebda990	meson/broadcom: libbroadcom_cle also needs zlib Fixes: `1ae8018a6a` ("meson: Add support for the vc4 driver.") Reviewed-by: Eric Anholt <eric@anholt.net>	2019-12-11 00:49:44 +00:00
Kenneth Graunke	0f2f561a10	anv: Enable Gen11 Color/Z write merging optimization TCCNTLREG contains additional L3 cache write merging optimizations. The default value on my system appears to be: - URB Partial Write Merging (bit 0) - L3 Data Partial Write Merging (bit 2) - TC Disable (bit 3) Windows drivers appear to set bit 1 as well to enable "Color/Z Partial Write Merging". This should solve an issue we were seeing where MRT benchmarks were using substantially more bandwidth than they ought. However, we have not observed it to cause measurable FPS gains. It is unclear whether we should be setting bit 0 or bit 3, so for now we leave those at the hardware default value. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-10 16:19:46 -08:00
Kenneth Graunke	5cc7636993	iris: Enable Gen11 Color/Z write merging optimization TCCNTLREG contains additional L3 cache write merging optimizations. The default value on my system appears to be: - URB Partial Write Merging (bit 0) - L3 Data Partial Write Merging (bit 2) - TC Disable (bit 3) Windows drivers appear to set bit 1 as well to enable "Color/Z Partial Write Merging". This should solve an issue we were seeing where MRT benchmarks were using substantially more bandwidth than they ought. However, we have not observed it to cause measurable FPS gains. It is unclear whether we should be setting bit 0 or bit 3, so for now we leave those at the hardware default value. Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed frequency, according to Felix Degrood. I didn't see any improvements at out-of-the-box power management settings, however. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-10 16:19:43 -08:00
Kenneth Graunke	0b74f85870	intel/genxml: Add a partial TCCNTLREG definition TCCNTLREG contains additional cache programming settings. In particular, there are several write combining controls we'd like to use. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-10 16:19:33 -08:00
Kenneth Graunke	74665eaf3a	util: Detect use-after-destroy in simple_mtx This makes simple_mtx_destroy set the counter to an invalid canary value and then makes lock/unlock assert that the value is legal. That way, calling lock/unlock after destroy will assert fail, rather than deadlocking or potentially even working. This has caught real deadlocks in dEQP multithreaded tests (in st/mesa shader variant zombie list handling), which have since been fixed. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-12-10 23:48:40 +00:00

1 2 3 4 5 ...

118398 Commits All Branches Search

118398 Commits

All Branches