When we had one gen supporting performance counters, it made sense to
have these builder macros in the .c file with the table. But time has
come to de-duplicate.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We're using vs and fs now, and adding hs, ds and gs soon. It's
confusing enough that we have both DS/TCS and HS/TES. At least for VS
and FS there doesn't have to be multiple names.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Also, swap vs and fs constructor or so fs comes first.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The driver can't determine PIPE_QUERY_PRIMITIVES_GENERATED or
PIPE_QUERY_PRIMITIVES_EMITTED once we support geometry or
tessellation, since these stages add primitives at runtime. Use the
WRITE_PRIMITIVE_COUNTS event to write back the primitive counts and
implement a hw query for this.
Reviewed-by: Rob Clark <robdclark@gmail.com>
This looks like clear copy-and-pasteos, and fixes:
dEQP-GLES2.functional.draw.random.40
(on A307 and A630, both tested in the new CI farm)
Reviewed-by: Rob Clark <robdclark@chromium.org>
These don't need to be in context, and we'll need them in screen in a
later patch. Plus it's a good cleanup.
Signed-off-by: Rob Clark <robdclark@chromium.org>
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The number of elements to draw should not be affected by the offset.
A similar fix was submitted for a6xx at 79180a05.
Fixes these dEQP tests on a5xx:
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_8
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_separate_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_combined_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_8
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_2500
Reviewed-by: Rob Clark <robdclark@gmail.com>
Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending
on whether half_precision was set in the shader key. With support for
mediump precision, it is possible to have different outputs use
different precisions. That means we can’t have a global shader state
to specify it. Instead it now tries to copy the half-float-ness
from the nir_variable for the output into the ir3_shader_variant. This
is then used to decide whether to set half-precision for each output.
The a6xx version is copied from the a5xx code but it has not been
tested.
v2. [Hyunjun Ko (zzoon@igalia.com)] There's the half flag recently
added, which represents precision based on IR3_REG_HALF. Now use this
flag to avoid duplication.
Signed-off-by: Rob Clark <robdclark@chromium.org>
The TTN path needs access to the screen to make the right decisions about
lowering, but we didn't have pctx->screen set up at fdN_prog_init time.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
My gcc can't see that the uninitialized value from the PIPE_BUFFER case
isn't used from the !PIPE_BUFFER cases later.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
There are other cases where we need to disable early-z, like image
writes. So rename to something more generic.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This patch makes it possible for freedreno to pass a pipe_screen
to tgsi_to_nir. This will be needed when tgsi_to_nir supports reading
pipe capabilities.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Images and SSBOs don't map directly to the hw. They end up being part
texture and part something else. Starting with a6xx, the hack used for
a5xx to smash the image tex state into hw texture state starting from
MAX counting down won't work, because we start using tex state also for
SSBO read.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This way they can be shared. Build tested with meson, but not too sure
on the autotools stuff though.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Rob Clark <robdclark@gmail.com>
First step to unify the way fd5 and fd6 blitter works. Currently a6xx
bypasses the blit API in order to also accelerate resource_copy_region()
But this approach can lead to infinite recursion:
#0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291
#1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479
#2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243
#3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350
#4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
#6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864
#7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993
#8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546
#9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129
#10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326
#11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416
#12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516
#13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376
#14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
...
Instead rework the API to push the fallback back to core code, so that
we can rework resource_copy_region() to have it's own fallback path,
and then finally convert fd6 over to work in the same way.
This also makes ctx->blit() optional, and cleans up some unnecessary
callers.
Signed-off-by: Rob Clark <robdclark@gmail.com>
If we emit shader as a pointer to a GEM object, also set the RELOC_DUMP
flag as a hint to kernel that this is a useful buffer to snapshot for
debug dumps.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be
re-used by some future vulkan driver. The parts that are gallium
specific have been refactored out and remain in the gallium driver.
Getting the move done now so that it can happen before further
refactoring to support a6xx specific instructions.
NOTE also removes ir3_cmdline compiler tool from autotools build since
that was easier than fixing it and I normally use meson build. Waiting
patiently for the day that we can remove *everything* from the autotools
build.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Split the parts that are gallium specific into ir3_gallium so the rest
can move to a common location outside of gallium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
A bit annoying to have to copy into our own struct. But this is
something the compiler really needs to know, at least on earlier
generations where streamout is implemented in shader.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Just massive search/replace for the most part.
Step towards removing ir3 dependency on disasm.h which is shared by
a2xx. One step closer to being able to move ir3 out of gallium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In the pursuit of lowering driver overhead, it became clear that some
amount of redesign of how libdrm_freedreno constructs the submit ioctl
would be needed. In particular, as the gallium driver is starting to
make heavier use of CP_SET_DRAW_STATE state groups/objects, the over-
head of tracking cmd buffers and relocs becomes too much. And for
"streaming" state, which isn't ever reused (like uniform uploads) the
overhead of allocating/freeing ringbuffer[1] objects is too high.
This redesign makes two main changes:
1) Introduces a fd_submit object for tracking bos and cmds table
for the submit ioctl, making ringbuffer objects more light-
weight. This was previously done in the ringbuffer. But we
have many ringbuffer instances involved in a submit (gmem +
draw + potentially 1000's of state-group rbs), and only need
a single bos and cmds table. (Reloc table is still per-rb)
The submit is also a convenient place for a slab allocator for
ringbuffer objects. Other options would have required locking
because, while we can guarantee allocations will only happen on
a single thread, free's could happen either on the application
thread or the flush_queue thread. With the slab allocator in
the submit object, any frees that happen on the flush_queue
thread happen after we know that the application thread is done
with the submit.
2) Introduce a new "softpin" msm_ringbuffer_sp implementation that
does not use relocs and only has cmds table entries for IB1 (ie.
the cmdstream buffers that kernel needs to CP_INDIRECT_BUFFER
to from the RB). To do this properly will require some updates
on the kernel side, so whether you get the softpin or legacy
submit/ringbuffer implementation at runtime depends on your
kernel version.
To make all these changes in libdrm would basically require adding a
libdrm_freedreno2, so this is a good point to just pull the libdrm code
into mesa. Plus it allows for using mesa's hashtable, slab allocator,
etc. And it lets us have asserts enabled for debug mesa buids but
omitted for release builds. And it makes life easier if further API
changes become necessary.
At this point I haven't tried to pull in the kgsl backend. Although
I left the level of vfunc indirection which would make it possible
to have other backends. (And this was convenient to keep to allow
for the "softpin" ringbuffer to coexist.)
NOTE: if bisecting a build error takes you here, try a clean build.
There are a bunch of ways things can go wrong if you still have
libdrm_freedreno cflags.
[1] "ringbuffer" is probably a bad name, the only level of cmdstream
buffer that is actually a ring is RB managed by kernel. User-
space cmdstream is all IB1/IB2 and state-groups.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Following the commit 2385d7b066 and 8e798e28f7, for resource dependancy
tracking.
Fixes: dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
with FD_MESA_DEBUG=inorder
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus
shader variant key to a generation specific state object.
This could eventually replace the linked list of shader variants, but
for now it lets us re-use the work currently done in fdN_program_emit()
Signed-off-by: Rob Clark <robdclark@gmail.com>
Prep work for a following patch, that introduces a cache to map from
program state (all shader stages) plus variant key to pre-baked hw
state (which could be emit'd via CP_SET_DRAW_STATE, for example).
To do that, we really want the variant key to be immutable, and to
treat the binning pass shader as an extra shader stage, rather than
as a VS variant.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is defined to always clear the entire surface(s) specified,
regardless of scissor state.. mesa/st will turn scissored clears
into a draw. So rip about a bunch of unnecessary machinery.
Also remove a comment that was obsolete since using u_blitter to
turn clear into draw (for the cases where there isn't a hw blitter
fast-path).
Signed-off-by: Rob Clark <robdclark@gmail.com>
The effective scissor changes based on rasterizer->scissor flag, so we
need to re-emit scissor state when rasterizer state changes.
Signed-off-by: Rob Clark <robdclark@gmail.com>