mirrors/mesa - Frog Git

Commit Graph

Author	SHA1	Message	Date
Rhys Perry	e8644122ed	nir/algebraic: mark a few comparison simplifications as precise No vkpipeline-db changes found. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-19 16:33:01 +00:00
Rhys Perry	79801b9d7d	nir/algebraic: optimize contradictory iand operands Some of these were found in a few GTAV, Rise of the Tomb Raider and Shadow of the Tomb Raider shaders. Results from vkpipeline-db run with ACO: Totals from affected shaders: SGPRS: 376 -> 376 (0.00 %) VGPRS: 220 -> 220 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 13492 -> 11560 (-14.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 69 -> 69 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: use False instead of 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-19 16:33:01 +00:00
Erico Nunes	32ced14bad	lima/ppir: handle all node types in ppir_node_replace_child ppir_node_replace_child is used by the const lowering routine in ppir. All types need to be handled here, otherwise the src node is not updated properly when one of the lowered nodes is a const, which results in, for example, regalloc not assigning registers correctly. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-07-19 16:01:45 +00:00
Erico Nunes	2292f0c4b5	lima/ppir: branch regalloc fixes The branch instruction has sources which must be handled in src handling paths so that regalloc assigns registers to them properly. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-07-19 16:01:45 +00:00
Yevhenii Kolesnikov	32b72cbca5	main: Destroy static hash table format_array_format_table has a static lifetime - it will be destroyed by an atexit handler. Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 11:22:55 +03:00
Dave Airlie	248161123c	radv: reset the window scissor with no clear state. If we don't have clear state (which gfx10 doesn't currently) we will fix to reset the scissor. AMDVLK will leave it set to something else. Marek also has this fix for radeonsi pending. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 11:00:44 +10:00
Dave Airlie	2ac2b98780	radv: fix crash in shader tracing. Enabling tracing, and then having a vmfault, can leads to a segfault before we print out the traces, as if a meta shader is executing and we don't have the NIR for it. Just pass the stage and give back a default. Fixes: `9b9ccee4d6` ("radv: take LDS into account for compute shader occupancy stats") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 11:00:25 +10:00
Timothy Arceri	80c2c17e1e	iris: change last_vue_stage() to look at uncompiled shaders This allows us to find the last vue stage before we have compiled the shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Timothy Arceri	30038dd5ec	nir/lower_clip: add support for geometry shaders This will be used to enabled compat profile support for geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Timothy Arceri	4b08bb4770	nir/lower_clip: add lower_clip_outputs() helper This will be reused in the following patch to add support for clip vertex lowering in geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Timothy Arceri	a59926b3ca	nir/lower_clip: add create_clipdist_vars() helper Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Timothy Arceri	e38b930876	nir/lower_clip: add a find_clipvertex_and_position_outputs() helper This will allow code sharing in a following patch that adds support for lowering in geometry shaders. It also allows us to exit early if there is no lowering to do which allows a small code tidy up. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Alyssa Rosenzweig	0395b58c92	panfrost: Set rt_count This doesn't quite work yet, but it illustrates how MRT is implemented in the MFBD: rt_count is set appropriately based on the number of render targets, while additional render target descriptors are appended on with an index variable in them (not quite decoded since there's some aspects we don't understand there, but conceptually this should be right). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	871ad7789f	panfrost: Trace invisible BOs Helps make the decode a little more readable (names instead of addresses). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	17752bae8e	panfrost/decode: Preserve empty tiler heap symmetry If tiler_heap_end == tiler_heap_start, ensure it's printed the same rather than one erroring out as hex. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	e797caa0dd	panfrost: Zero polygon list body size for clears There's no polygons, so you can't have any size to the polygon list, although there is a minimal header. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	f475b79980	panfrost/mfbd: Unify depth-only with masked FBO path Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	629c7366a7	panfrost: Simplify set_framebuffer_state Most of the ad hoc logic is already in Gallium. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	227c395c00	panfrost: Check for NULL surface in places Fixes a bunch of NULL dereferences, although it does cause GPU faults of course. This is caused by color buffers masked out in MRT, which we'll eventually have to solve the right way... one thing at a time. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	79b13b4376	panfrost: Expose 4 render targets Hidden behind deqp flag as usual. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	d56f92502e	panfrost: Shrink tiler heap 128MB is excessive and 16MB is still plenty. Saves 112MB/context on kernels without growable/heap support. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:16 -07:00
Caio Marcelo de Oliveira Filho	b6d4753568	nir/large_constants: De-duplicate constants If a function has a constant and is called more than once, after inlining we may end up with different variables representing the same constant. This commit look into the data and de-duplicate them. The first pass now will collect the constant data in a per variable buffer, then de-duplication happens (by sorting then linear walk), and the second pass will use the data in var->data.location. One side-effect of the current implementation is that constants will be reordered. If this turns out to be a problem is something that can be fixed. An alternative strategy considered was to perform this in a per-function basis and then merge the results, the problem is that we would have to fix up the offsets during the merge. Given the data we have, the current patch is good enough. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-18 12:24:24 -07:00
Caio Marcelo de Oliveira Filho	d9b67ad079	nir/large_constants: Use ralloc for var_infos This will be used later on to allocate constant data for each variable (and then deduplicate). Also drop initializing found_read, as it is already implicitly false in the literal. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-18 12:24:24 -07:00
Eric Anholt	0d8a4c67cf	freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper. Cuts a bunch of boilerplate. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	56f4ede73d	freedreno: Convert load_barycentric_at_sample to the NIR lowering helper. Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	61098baf42	freedreno: Convert load_barycentric_at_offset to the NIR lowering helper. Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	cdc359c58e	v3d: Use nir_shader_lower_instructions() for txf_ms lowering. Cuts out a bunch of boilerplate. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	251c64a53d	nir: Allow internal changes to the instr in nir_shader_lower_instructions(). v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in NIR, but doesn't generate a new txf_ms instructions as replacement. It's pretty easy to allow that in nir_shader_lower_instructions, and it may be common in lowering passes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-18 11:28:56 -07:00
Eric Anholt	c0640035fb	vc4: Convert vc4_nir_lower_txf_ms to nir_shader_lower_instructions(). Cuts out a bunch of boilerplate. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	40e7609603	v3d: Fix assertion failures in debug builds. nir_lower_io leaves around deref_var instructions after lowering away deref intrinsics. This ends up breaking validation after v3d_nir_lower_io removes variables not actually being stored by the shader's store_output()s. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-07-18 11:28:56 -07:00
Alyssa Rosenzweig	1bced0fad2	panfrost: Handle Z24 textures Just use the Z32 code. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	f29c084960	panfrost/ci: Update expectations We just fixed some stencil tests. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	fad76470d5	panfrost: Make scissor test more robust See v3d implementation. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	5c554e235d	panfrost: Use correct NO_DITHER field on MFBD Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	676b9339dd	panfrost: Implement Z32F(_S8) support Z32F uses a dediacted float path. Z32F_S8 uses separate stencil planes in the hardware, lowered via u_transfer_helper. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	479185a1cd	panfrost/decode: Don't disassemble NULL shaders It is legal to load a shader from a NULL address, particularly when the TILER job is used strictly for effects on the Z/S buffer with 0x0 color mask. Don't crash the decoder in this case. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	65d89097b8	panfrost: Copy stencil front to back if back disabled When backside stenciling is disabled, backfacing primitives just do the same thing as frontfacing primitives. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Jan Zielinski	6f7306c029	swr/rast: Refactor memory API between rasterizer core and swr This commit cleans up API between the core of the rasterizer and swr. Some formatting changes are also done. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-07-18 16:17:00 +02:00
Andreas Baierl	4627a0c4eb	lima/ppir: Add gl_PointCoord handling Treat gl_PointCoord as a system value and add the necessary bits for correct codegen. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Andreas Baierl	3523233027	gallium: Add PIPE_CAP_TGSI_FS_POINT_IS_SYSVAL This adds an option to treat gl_PointCoord as a system value. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Andreas Baierl	3349a60f6f	nir/tgsi: Extend tgsi_to_nir.c to support gl_PointCoord as a system value. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Andreas Baierl	f5804f1768	nir: Add gl_PointCoord system value gl_PointCoord handling needs some special bits set in lima/ppir code generation. Treating gl_PointCoord as a system value makes it easier to distinguish from a regular varying. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Andreas Baierl	24af57407c	glsl: Optionally declare gl_PointCoord as a system value Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Connor Abbott	b178fdf486	lima/gp: Fix problem with complex moves When writing the scheduler, we forgot that you can't read the complex unit in certain sources because it gets overwritten to 0 or 1. Fixing this turned out to be possible without giving up and reducing GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't expect. There can be at most 4 next-max nodes that can't have moves scheduled in the complex slot, so it actually isn't a problem for getting the number of next-max nodes at 5 or lower. However, it is a problem for stores. If a given node is a next-max node whose move cannot go in the complex slot and is used by a store that we decide to schedule, we have to reserve one of the non-complex slots for a move instead of all the slots, or we can wind up in a situation where only the complex slot is free and we fail the move. This means that we have to add another term to the reservation logic, for stores whose children cannot be in the complex slot. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	54434fe670	lima/gpir: Rework the scheduler Now, we do scheduling at the same time as value register allocation. The ready list now acts similarly to the array of registers in value_regalloc, keeping us from running out of slots. Before this, the value register allocator wasn't aware of the scheduling constraints of the actual machine, which meant that it sometimes chose the wrong false dependencies to insert. Now, we assign value registers at the same time as we actually schedule instructions, making its choices reflect reality much better. It was also conservative in some cases where the new scheme doesn't have to be. For example, in something like: 1 = ld_att 2 = ld_uni 3 = add 1, 2 It's possible that one of 1 and 2 can't be scheduled in the same instruction as 3, meaning that a move needs to be inserted, so the value register allocator needs to assume that this sequence requires two registers. But when actually scheduling, we could discover that 1, 2, and 3 can all be scheduled together, so that they only require one register. The new scheduler speculatively inserts the instruction under consideration, as well as all of its child load instructions, and then counts the number of live value registers after all is said and done. This lets us be more aggressive with scheduling when we're close to the limit. With the new scheduler, the kmscube vertex shader is now scheduled in 40 instructions, versus 66 before. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	12645e8714	lima/gp: Mark more add-only nodes as maybe-two-slot Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	16de3dd7a6	lima/gpir: Fix some bugs in instruction handling Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	cc78a42577	lima: Reintroduce the standalone compiler I used this to test things without needing to have a device handy. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	4423552ff0	nir/lower_viewport: Check variable mode first The location is unused for shader_temp and function_temp variables, and due to the way we nir_lower_io_to_temproraries demotes shader_out variables to shader_temp variables, it happened to equal VARYING_SLOT_POS for the gl_Position temporary, which made this pass fail with the offline compiler due to this coming before vars_to_ssa. Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:21:41 +02:00
Samuel Pitoiset	6e5e4bf050	radv/gfx10: set BREAK_WAVE_AT_EOI if TES or GS enable the primitive ID Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 10:37:10 +02:00

1 2 3 4 5 ...

113297 Commits All Branches Search

113297 Commits

All Branches