KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Thomas Helland	f0372814a9	util: Move u_dynarray to src/util This will be used as the basis for unification Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-06-07 21:07:24 +02:00
Thomas Helland	a66befc3c8	gallium: Add missing includes These will need to be in place to avoid regressions when removing these includes from the u_dynarray Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-06-07 21:07:24 +02:00
Marek Olšák	bacaceb78a	radeonsi: update clip_regs on shader state changes only when it's needed Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:20 +02:00
Marek Olšák	2b7fd9df9a	radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:18 +02:00
Marek Olšák	140b3c5019	radeonsi: add a new helper si_get_vs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:16 +02:00
Samuel Pitoiset	878bd981bf	radeonsi: isolate real framebuffer changes from the decompression passes (v3) When a stencil buffer is part of the framebuffer state, it is decompressed but because it's bindless, all draw calls set stencil_dirty_level_mask to 1. v2: Marek - set the flags outside the loop - also clear and set framebuffer.do_update_surf_dirtiness there - do it in the DB->CB copy path too v3: Marek - save and restore the do_update_surf_dirtiness flag Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:14 +02:00
Marek Olšák	257b538fd2	radeonsi: do EarlyCSEMemSSA LLVM pass so that LLVM IR looks like CSE has been run on it. It's also recommended by the instruction combining pass. This also fixes: - GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash) - piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail) The code size decrease is positive, the register usage isn't. There is a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown and GRID Autosport. EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE. SGPRS: 1935420 -> 1938076 (0.14 %) VGPRS: 1645504 -> 1645988 (0.03 %) Spilled SGPRs: 2493 -> 2651 (6.34 %) Spilled VGPRs: 107 -> 115 (7.48 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 1512 -> 1516 (0.26 %) dwords per thread Code Size: 61981592 -> 61890012 (-0.15 %) bytes Max Waves: 371847 -> 371798 (-0.01 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:09 +02:00
Marek Olšák	e9409c86e7	radeonsi: remove 8 bytes from si_shader_key We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:17:06 +02:00
Marek Olšák	2b8b9a56ef	radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 20:14:15 +02:00
Thomas Hellstrom	2c4ec3f93f	svga: Always set the alpha value to 1 when sampling using an XRGB view If the XRGB view is sampling from an ARGB svga format, change PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels. Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which could be both insufficient and incorrect. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2017-06-07 19:43:54 +02:00
Thomas Hellstrom	df4d6003dc	svga: Fix imported surface view creation When deciding to create a view with or without an alpha channel we need to look at the SVGA3D format and not the PIPE format. This fixes the glx-tfp piglit test for dri3/xa. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2017-06-07 19:43:54 +02:00
Thomas Hellstrom	c2138a066c	svga: Set alpha to 1 for non-alpha views Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those and similar cases we need to set the alpha value to 1 when sampling. Fixes piglit glx::glx-tfp Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2017-06-07 19:43:54 +02:00
Thomas Hellstrom	1887faf73b	svga: Allow format differences in 16-bit RGBA surface sharing For the purpose of surface sharing, treat SVGA3D_R5G6B5 and SVGA3D_B5G6R5_UNORM as identical formats. This fixes the following piglit tests with dri3/xa: glx@glx-visuals-depth -pixmap glx@glx-visuals-stencil -pixmap Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Deepak Singh Rawat <drawat@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2017-06-07 19:43:54 +02:00
Thomas Hellstrom	b8b0a3dc5c	dri/vmwgfx: Disable a couple of glx extensions also for Ubuntu unity / compiz It appears like the GLX_EXT_buffer_age extension also prevents Compiz / Ubuntu Unity from performing partial buffer swaps when it otherwise feels like doing so. So try to get them back again. We also disable GLX_OML_sync_control since it appears it had a favourable impact on gnome-shell. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>	2017-06-07 19:43:54 +02:00
Thomas Hellstrom	37e8341db4	dri: Turn of a couple of glx extensions for gnome-shell on vmwgfx. Increases performance on vmwgfx since we're avoiding full buffer damage and since we can't sync to vertical retrace anyway. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-07 19:43:54 +02:00
Thomas Hellstrom	48f4baf63f	st/dri: Allow gallium drivers to turn off two GLX extensions Allow gallium drivers to turn off GLX_EXT_buffer_age and GLX_OML_sync_control if needed, using driconf. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-07 19:43:54 +02:00
Thomas Hellstrom	9d3f177e4b	dri: Optionally turn off a couple of GLX extensions based on driconf options With GLX_EXT_buffer_age turned on, gnome-shell will use full-screen damage with GLX, which severely hurts performance with architectures that emulate page-flips with copies. Like vmware. We would like to be able to turn off that extension. Similarly, typically the GLX_OML_sync_control doesn't make much sense on a virtual architecture since we don't really sync to the host's vertical retrace. We'd like to be able to turn it off as well. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-07 19:43:54 +02:00
Thomas Hellstrom	ff2978b449	st/dri: Allow dri users to query also driver options There will be situations where we want to control, for example, the GLX behaviour based on applications and drivers. So allow DRI users access to the driver options. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-06-07 19:43:54 +02:00
Marek Olšák	7d67cbefe0	radeonsi: clean up decompress blend state names Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 19:38:45 +02:00
Marek Olšák	882c18bf1c	gallium/radeon: clean up a misleading statement from the old days Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 19:38:43 +02:00
Marek Olšák	66176e6f14	radeonsi: don't use 1D tiling for Z/S on VI to get TC-compatible HTILE It's always good to have fewer decompress blits. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 19:38:42 +02:00
Marek Olšák	d2ee423b69	radeonsi: enable TC-compatible stencil compression on VI Most things are in place. Ideally we won't see decompress blits for stencil anymore. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 19:38:39 +02:00
Marek Olšák	e003e3c4c0	st/mesa: don't keep framebuffer state in st_context Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:46:21 +02:00
Marek Olšák	f34abf77e9	st/mesa: cache pipe_surface for GL_FRAMEBUFFER_SRGB changes Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:46:21 +02:00
Marek Olšák	f7523f1ef6	st/mesa: use gl_driver_flags::NewFramebufferSRGB also call st_init_driver_flags when st_context is initialized. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:46:21 +02:00
Marek Olšák	ac0aff7222	mesa: add gl_driver_flags::NewFramebufferSRGB _NEW_BUFFERS updates too much stuff. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:46:21 +02:00
Marek Olšák	3effce4fb0	radeonsi/gfx9: prevent a race when the previous shader's main part is missing Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	b5bc826ead	radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	ffbaba6072	radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9 LS is merged into TCS. If there is no TCS, LS is merged into fixed-func TCS. The problem is the fixed-func TCS was ignored by scratch update functions, so LS didn't have the scratch buffer set up. Note that Mesa 17.1 doesn't have merged shaders. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	6e2c07749b	radeonsi: move streamout state update out of si_update_shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	294be5279d	radeonsi: remove dead code in declare_input_fs Colors are interpolated in the PS prolog. This was never used. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	8147c4a4a5	radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_key Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	86cc809726	radeonsi: use a compiler queue with a low priority for optimized shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	89b6c93ae3	util/u_queue: add an option to set the minimum thread priority Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	6f2947fa79	radeonsi: decrease the number of compiler threads to num CPUs - 1 Reserve one core for other things (like draw calls). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	38bd468a78	radeonsi: drop unfinished shader compilations when destroying shaders If we enqueue too many jobs and destroy the GL context, it may take several seconds before the jobs finish. Just drop them instead. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Marek Olšák	33e507ec23	util/u_queue: add a way to remove a job when we just want to destroy it Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:43:42 +02:00
Rob Clark	812fd1aaa8	freedreno/a5xx: set SP_BLEND_CONTROL properly Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-06-07 12:32:00 -04:00
Rob Clark	5b60004525	freedreno/a5xx: LRZ support Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-06-07 12:32:00 -04:00
Rob Clark	313f6360aa	freedreno: drop timestamp field unused. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-06-07 12:32:00 -04:00
Rob Clark	5589ba983d	freedreno/a5xx: refactor out helper for LRZ flush Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-06-07 12:32:00 -04:00
Rob Clark	e26a7c1cf2	freedreno: reshuffle FD_MESA_DEBUG bitmask Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-06-07 12:32:00 -04:00
Rob Clark	613410c8fc	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-06-07 12:32:00 -04:00
Marek Olšák	a893c91697	gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible so that we can use TXF. The cubemap blit pixel shader code size: 148 -> 92 bytes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:10:50 +02:00
Marek Olšák	4a88c7774c	gallium/u_blitter: use TXF if possible This fixes piglit: arb_texture_view-rendering-r32ui TEX (image_sample) flushes denorms to 0 with FP32 textures on GCN, but such a texture can contain integer data written using an integer render view. If we do a transfer blit with TEX, denorms are flushed to 0. Luckily, TXF (image_load) doesn't do that. TXF also doesn't need to load the sampler state, so blit shaders don't have to do s_load_dwordx4. TXF doesn't do CLAMP_TO_EDGE, so it can only be used if the src box is in bounds, or if we clamp manually (this commit doesn't). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:10:50 +02:00
Marek Olšák	0604568527	gallium/u_blitter: use TEX_LZ if it's supported The sampler views always have first_level == last_level. Now radeonsi doesn't have to use the WQM. (a few SALU removed) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:10:50 +02:00
Marek Olšák	eedca3323e	gallium/util: add _LZ and TXF options to simple shaders Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:10:50 +02:00
Marek Olšák	20c2785f7c	gallium/ureg: add TEX/TXF_LZ opcodes to ureg Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-07 18:10:50 +02:00
Jason Ekstrand	dd294fd2d9	i965: Use BLORP for all HiZ ops BLORP has been capable of doing gen8-style HiZ ops for a while now. We might as well start using it. The one downside is that this may cause a bit more state emission since we still re-emit most things for BLORP. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-07 08:54:54 -07:00
Jason Ekstrand	bacae7221b	blorp: Use FullSurfaceDepthandStencilClear for blorp_hiz_op The blorp_hiz_op entrypoint always acts on a full subresource of a HiZ buffer so we can just set the flag unconditionally. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-06-07 08:54:54 -07:00

1 2 3 4 5 ...

92759 Commits All Branches Search

92759 Commits

All Branches