Commit Graph

418 Commits

Author SHA1 Message Date
Kenneth Graunke 5b682143da nir: Make nir_lower_clip_vs optionally work with variables.
The way nir_lower_clip_vs() works with store_output intrinsics makes a
ton of assumptions about the driver_location field.

In i965 and iris, I'd rather do this lowering early and work with
variables.  v3d may want to switch to that as well, and ir3 could too,
but I'm not sure exactly what would need updating.  For now, handle
both methods.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-11-19 14:33:16 -08:00
Eric Anholt 538bca78e2 v3d: Don't try to set PF flags on a LDTMU operation
We need an ALU op in order to set PF.  Fixes a recent assertion failure in
dEQP-GLES3.functional.ubo.single_basic_type.shared.bool_vertex
2018-11-15 11:12:54 -08:00
Eric Anholt 4e1b163eed v3d: Update the TLB config for depth writes on V3D 4.2.
Fixes 311 piglit cases on the simulator.
2018-11-01 13:56:30 -07:00
Emil Velikov 986033a275 configure: allow building with python3
Pretty much all of the scripts are python2+3 compatible.
Check and allow using python3, while adjusting the PYTHON2 refs.

Note:
 - python3.4 is used as it's the earliest supported version
 - python2 chosen prior to python3

v2: use python2 by default

Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2018-10-31 19:15:50 +00:00
Eric Anholt cc54e1acf9 v3d: Use nir_remove_unused_io_vars to handle binner shader output DCE
We were doing this late after nir_lower_io, but we can just reuse the core
code.  By doing it at this stage, we won't even set up the VS attributes
as inputs, reducing our VPM size.
2018-10-30 10:46:52 -07:00
Eric Anholt c152c79d5e v3d: Only add output slot tracking for the current varying slot.
We always emit 4 slots per slot because things like color output and
position processing in the epilogue will potentially look up more values
than the variable declaration had.  However, when we get a .location_frac
!= 0, we don't want to overwrite components of the following
.driver_location.
2018-10-30 10:46:52 -07:00
Eric Anholt 17c8198952 v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components.
This lets us trim unused trailing components in the vertex attributes,
reducing the size of our VPM allocations.
2018-10-30 10:46:52 -07:00
Eric Anholt fc85f7cfdc v3d: Don't rely on sorting input vars for VPM read setup.
For supporting scalar VPM i/o at the NIR level, we need to do a pass over
the vars to figure out how big each attribute is after DCE.  Once we've
done that, we can just walk over c->vattr_sizes[] instead of bothering
with vars.
2018-10-30 10:46:52 -07:00
Eric Anholt cc78676030 v3d: Split out NIR input setup between FS and VPM.
They don't share much code, and I'm about to rewrite the remaining shared
code for the VPM case.
2018-10-30 10:46:52 -07:00
Eric Engestrom bb84fa146f util: use C99 declaration in the for-loop hash_table_foreach() macro
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-10-25 12:43:18 +01:00
Eric Anholt 8ec83dc51e v3d: Add support for hardware pack/unpack of half floats.
Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test
in half.
2018-10-15 17:16:44 -07:00
Mauro Rossi cc3b99bb48 android: broadcom/cle: export the broadcom top level path headers
Fixes the following building error in vc4 build:

In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_render_cl.c:34:
In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_drv.h:27:
In file included from external/mesa/src/gallium/drivers/vc4/vc4_simulator_validate.h:34:
In file included from external/mesa/src/gallium/drivers/vc4/vc4_context.h:39:
In file included from external/mesa/src/gallium/drivers/vc4/vc4_cl.h:56:
gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10:
fatal error: 'cle/v3d_packet_helpers.h' file not found
         ^~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.")
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2018-09-15 09:14:46 +02:00
Mauro Rossi 9158e0bd82 android: broadcom/cle: add gallium include path
Fixes the following building error:

In file included from external/mesa/src/broadcom/cle/v3d_decoder.c:38:
In file included from external/mesa/src/broadcom/cle/v3d_packet_helpers.h:29:
external/mesa/src/gallium/auxiliary/util/u_math.h:42:10:
fatal error: 'pipe/p_compiler.h' file not found
         ^~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.")
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2018-09-15 09:14:42 +02:00
Mauro Rossi 3341429d74 android: broadcom/genxml: fix collision with intel/genxml header-gen macro
Fixes the following building error, happening when building both intel and broadcom:

Gen Header: libmesa_broadcom_genxml_32 <= v3d_packet_v21_pack.h
FAILED: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h
/bin/bash -c "python external/mesa/src/broadcom/cle/gen_pack_header.py \
external/mesa/src/broadcom/cle/v3d_packet_v21.xml \
> gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h"
Traceback (most recent call last):
  File "external/mesa/src/broadcom/cle/gen_pack_header.py", line 626, in <module>
    p = Parser(sys.argv[2])
IndexError: list index out of range

header-gen macro is already defined by Intel genxml building rules
and the existing header-gen does not have the $(PRIVATE_VER) argument,
infact the bash command line logged in the building error is missing
exactly $(PRIVATE_VER) argument

Renaming the macro as pack-header-gen in src/broadcom/Android.genxml.mk
solves the building error, another possible way is to keep the gen rules
commands expanded and not use the macros.

Fixes: 7f80a9ff13 ("vc4: Introduce XML-based packet header generation like Intel's.")
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2018-09-15 09:14:33 +02:00
Dylan Baker 80825abb5d move u_math to src/util
Currently we have two sets of functions for bit counts, one in gallium
and one in core mesa. The ones in core mesa are header only in many
cases, since they reduce to "#define _mesa_bitcount popcount", but they
provide a fallback implementation. This is important because 32bit msvc
doesn't have popcountll, just popcount; so when nir (for example)
includes the core mesa header it doesn't (and shouldn't) link with core
mesa. To fix this we'll promote the version out of gallium util, then
replace the core mesa uses with the util version, since nir (and other
non-core mesa users) can and do link with mesautils.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-09-07 10:21:26 -07:00
Eric Anholt a91b158bd9 v3d: Fix setup of the VCM cache size.
There were two bugs working together to make things mostly work: I wasn't
dividing the VPM output size available by the size of a batch (vertex),
but I also had the size of the VPM reduced by a factor of 8.

Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it
seems also my intermittent varying failures.

Fixes: 1561e4984e ("v3d: Emit the VCM_CACHE_SIZE packet.")
2018-09-07 08:11:38 -07:00
Emil Velikov cff80b6c15 Revert "configure: allow building with python3"
This reverts commit ae7898dfdb.

Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.

Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)

https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
2018-08-24 11:14:15 +01:00
Emil Velikov ae7898dfdb configure: allow building with python3
Pretty much all of the scripts are python2+3 compatible.
Check and allow using python3, while adjusting the PYTHON2 refs.

Note:
 - python3.4 is used as it's the earliest supported version
 - python3 chosen prior to python2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-23 17:00:13 +01:00
Mathieu Bridon 2ee1c86d71 meson: Build with Python 3
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.

Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-10 15:15:09 -07:00
Eric Anholt 1561e4984e v3d: Emit the VCM_CACHE_SIZE packet.
This is needed to ensure that we don't get blocked waiting for VPM space
with bin/render overlapping.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt 50a8713d4f v3d: Avoid spilling that breaks the r5 usage after a ldvary.
Fixes bad rendering when forcing 2 spills in glxgears.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt f2c0d310d6 v3d: Make sure that QPU instruction-has-a-dest matches VIR.
Found when debugging register spilling -- we would try to spill the dest
of a STVPMV, inserting spill code after entering the last segment.  In
fact, we were likely to to choose to do this, given that the STVPMV "dest"
temp was never read from, making it cheap to spill.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt 3f9cb2eb05 v3d: Wait for TMU writes to complete before continuing after a spill.
The simulator complained that we had write responses outstanding at shader
end.  It seems that a TMU read does not guarantee that previous TMU writes
by the thread have completed, which surprised me.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt ccbe33af5b v3d: Make sure we don't emit a thrsw before the last one finished.
Found while forcing some spilling, which creates a lot of short
tmua->thrsw->ldtmu sequences.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt f9d54dc3cf v3d: Add some debug code for forcing register spilling.
This is useful for periodically testing out register spilling to see how
it goes on simple shaders, rather than only failing on insanely
complicated ones.
2018-08-06 13:03:23 -07:00
Eric Anholt c2eab33b08 v3d: Actually put the "%s" in the snprintf.
I missed an important part when porting the change over, fixing my
compiler warning but breaking -Werror=format-security.

Fixes: e6ff5ac446 ("v3d: use snprintf(..., "%s", ...) instead of strncpy")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107443
2018-08-01 11:39:19 -07:00
Eric Anholt e6ff5ac446 v3d: use snprintf(..., "%s", ...) instead of strncpy
Fixes a compiler warning about terminator NUL, based on f836d799f9
("intel/decoder: use snprintf(..., "%s", ...) instead of strncpy")
2018-07-31 16:42:11 -07:00
Eric Anholt 3471ce9985 v3d: Add support for the TMUWT instruction.
This instruction is used to ensure that TMU stores have been processed
before moving on.  In particular, you need any TMU ops to be done by the
time the shader ends.
2018-07-31 16:05:04 -07:00
Eric Anholt d934492ff9 v3d: Dump the contents off all the buffers in CLIF mode.
A V3D_DEBUG=clif file from a non-texturing .shader_test can now be
successfully run through the CLIF runner in the simulator.  Now I need to
build an open source CLIF runner against the v3d DRM module.
2018-07-30 14:29:01 -07:00
Eric Anholt 99a5ac250b v3d: Split walking the CLs to generate relocs from walking CLs to dump.
We need to dump each buffer's contents in order for a CLIF file, so we
need to collect all of the relocs into a buffer (such as the indirect CL
full of both uniforms and GL shader states) before we start dumping.
2018-07-30 14:29:01 -07:00
Eric Anholt 2df6f1a3df v3d: Include commands to run the BCL and RCL in CLIF dumps. 2018-07-30 14:29:01 -07:00
Eric Anholt c6449e33e3 v3d: Use a short, underscored name for packets in CLIF/CL dumping.
These will match the names that the CLIF parser expects to see.  I may in
the future decide to change more of the other names so that I match the
names the HW/closed SW team uses for their packets, rather than the names
in the spec (which only they and I can read anyway).
2018-07-30 14:29:01 -07:00
Eric Anholt b56f8c475e v3d: Rename "configuration" and "config" in the XML to "cfg"
This matches what CLIF parsing expects, and makes
TILE_BINNING_MODE_CONFIGURATION_COMMON_CONFIGURATION into a much more
legible TILE_BINNING_MODE_CFG_COMMON.
2018-07-30 14:29:01 -07:00
Eric Anholt 300e609feb v3d: s/colour/color in the XML.
The CLIF format expects american english spelling, and the rest of Mesa is
too.  I was previously adhering to the spec's spelling, which is
counterproductive.
2018-07-30 14:29:01 -07:00
Eric Anholt 3a8550ad06 v3d: Rename primitives to prims in the XML to match CLIF names.
This makes us match up with the V3D HW team's names a bit more.
2018-07-30 14:29:01 -07:00
Eric Anholt 6237c64049 v3d: Print CLIF fixed-point values as just their decimal value.
The parser doesn't handle float input, so we have to dump the raw value.
2018-07-30 14:29:01 -07:00
Eric Anholt 8da47b7648 v3d: When not doing terminal pretty-printing, comment struct field names.
The struct field names aren't part of the CLIF ABI, just the order of
fields within the struct.  The comments are there for human readability.
2018-07-30 14:29:01 -07:00
Eric Anholt 103f21b13d v3d: Add a separate flag for CLIF ABI output versus human-readable CLs.
A few of the upcoming changes would make the V3D_DEBUG=cl output less
readable, so let's make proper CLIF file production be under a separate
V3D_DEBUG=clif flag.
2018-07-30 14:29:01 -07:00
Eric Anholt 89ac6fa403 v3d: Add pack header support for f187 values.
V3D only has one of these (the top 16 bits of a float32) left in its CLs,
but VC4 had many more.  This gets us proper pretty-printing of the values
instead of a large uint.
2018-07-30 14:29:01 -07:00
Eric Anholt 27f1bfe471 vc4: Fix meson build when enabled without v3d.
Reported-by: Rob Clark <robdclark@gmail.com>
Fixes: e92959c4e0 ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
2018-07-29 19:13:29 -07:00
Eric Anholt 942456f646 v3d: Skip printing sub-id or pad fields in CLIF dumping.
The parser doesn't expect them, so our fields would end up mismatched.
They're not really useful in console output, either.
2018-07-27 18:00:48 -07:00
Eric Anholt 3ee0ab599e v3d: Emit commands to switch CLIF parser to CL/shader/attr input mode.
By default after saying you are emitting a buffer, it'll expect a buffer
size.  Once you set a format, it'll keep parsing that format until you
announce something else.
2018-07-27 18:00:46 -07:00
Eric Anholt a57770aa37 v3d: Dump fields in CLIF output in increasing offset order.
Previously, we emitted in XML order, which I happen to type in the
decreasing offset order of the specifications.  However, the CLIF parser
wants increasing offsets.
2018-07-27 17:56:55 -07:00
Eric Anholt 95bafeeabf v3d: Print addresses in CLIFs as references to buffers.
With CLIFs, the parser will choose an address for the buffer being
created, so we need to use effectively relocations to buffers instead of
the addresses that the driver uses.  This is also a whole lot more
intelligible for console output than raw addresses!
2018-07-27 17:56:36 -07:00
Eric Anholt 3c02838d29 v3d: Stop doing pretty-printed colorful booleans in CLIF output.
The parser wants to see a 1 or 0.  We can put "true" and "false" in a
comment to clarify that it's a boolean and the parser will skip it.
2018-07-27 17:55:57 -07:00
Eric Anholt 422910d2e7 v3d: Move clif dumping to a separate step from noting where the CLs are.
Now all the printing happens from the same worklist processing.
2018-07-27 17:08:35 -07:00
Eric Anholt 01b4952773 v3d: Move clif dump BO lookup into the clif dumper.
The clif dumper is going to need information about all of our BOs if we're
going to dump them for replay purposes.
2018-07-27 17:08:35 -07:00
Eric Anholt e92959c4e0 v3d: Pass the whole clif_dump structure to v3d_print_group().
To generate CLIF files that the v3dv3 simulator can parse, we're going to
need to decode addresses, and for that we'll need the vaddr lookup
function from the clif structure from within v3d_decoder.
2018-07-27 17:08:35 -07:00
Eric Anholt 9bf9a6d6a1 v3d: Drop the VG support from the XML.
This reflects a change on the HW/closed SW side to drop this unused HW.
With it dropped on their side, the CLIF parser no longer expects to find
VG fields.
2018-07-27 12:56:36 -07:00
Eric Anholt 5a1cc3861c v3d: Use /* */ instead of () for enum names in CLIF output.
This lets the comments be ignored by the CLIF parser.
2018-07-27 12:56:36 -07:00
Eric Anholt 95a0f99825 v3d: CLIF-dump the "Vec size" field as 0 == maximum value.
That's what a user should want to see, and what the CLIF parser wants.
This should maybe be generalized.
2018-07-27 12:56:36 -07:00
Eric Anholt d934d3206e nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.
This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-26 11:00:34 -07:00
Mathieu Bridon 9ebd8372b9 python: Use range() instead of xrange()
Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.

Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().

As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-07-24 11:07:04 -07:00
Eric Anholt 6b73a97f84 v3d: Implement a small immediates optimization, based on VC4's.
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).

total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs:     82711 -> 80163 (-3.08%)
2018-07-23 10:21:43 -07:00
Eric Anholt 79e0f042bc v3d: Return an invalid src number if asked for a missing implicit uniform.
Sometimes when iterating over sources, we might want to check if it's the
implicit one.  We wouldn't want to match on a non-implicit src using this
function.
2018-07-23 10:21:43 -07:00
Eric Anholt f2ea936f48 v3d: Skip emitting texture config parameter 2 if it's just the defaults.
shader-db:
total instructions in shared programs: 91275 -> 90768 (-0.56%)
instructions in affected programs:     20702 -> 20195 (-2.45%)
2018-07-23 10:21:43 -07:00
Eric Anholt 421e99d777 v3d: Update an XXX comment for a path we handled in HW on V3D 4.x. 2018-07-23 10:21:43 -07:00
Eric Anholt e7ae900341 v3d: Switch to using the new SFU instructions on V3D 4.x.
These instructions let us write directly to the phys regfile, instead of
just R4.  That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.

There is still an extra instruction of latency, which is not represented
in the scheduler at the moment.  If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value.  That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.

total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs:     82590 -> 78196 (-5.32%)
2018-07-23 10:21:43 -07:00
Eric Anholt 58c1d3860f v3d: Add QPU pack/unpack for the new SFU instructions.
These instructions allow writing the result to any register, instead of a
special writeback to r4.
2018-07-23 10:21:43 -07:00
Eric Anholt cdfa99657d v3d: Fix the name of the "flpop" operation.
Noticed while trying to sort a new op into the appropriate place to match
the documentation.
2018-07-23 10:21:43 -07:00
Eric Anholt 91e24e5718 v3d: Print the instruction we're testing in the QPU disasm/pack round-trip.
If we fail initial disassembly, it's good to know what instruction it was
that failed.
2018-07-23 10:21:42 -07:00
Eric Anholt a1beb333d8 v3d: Drop unused vir_SAT() operation.
We lower saturates in NIR.
2018-07-23 10:21:42 -07:00
Eric Anholt 8dfc6ee317 v3d: Rotate through registers to improve post-RA scheduling options.
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier.  I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.

shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs:     77254 -> 76092 (-1.50%)
2018-07-23 10:21:42 -07:00
Eric Anholt 1fb31819ae v3d: Allow reading from physical regs written in the previous instruction.
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.

shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs:     48520 -> 47234 (-2.65%)
2018-07-23 10:21:23 -07:00
Eric Anholt 229836fb37 v3d: Disable shader-db cycle estimates until we sort out TMU estimates.
I keep having to ignore these shader-db changes since I don't trust them,
so just disable the reports entirely.
2018-07-16 14:39:59 -07:00
Eric Anholt 2baab6bf2a v3d: Emit the lowered uniform just before its first use in a block.
total instructions in shared programs: 98578 -> 98119 (-0.47%)
instructions in affected programs:     27571 -> 27112 (-1.66%)

and it also eliminates most spills/fills on the CTS's randomized uniform
usage testcases.
2018-07-16 14:39:59 -07:00
Eric Anholt 26f830d9fc v3d: Add an assert that we don't provide an invalid texture return words.
The docs had an update noting this restriction, so reflect it in the code.
2018-07-16 14:39:59 -07:00
Eric Anholt d661d78464 v3d: Apply GFXH-1625 restriction on TMUWT in the end of the shader.
This doesn't affect us yet since we're not doing TMUWTs, but I think we
will for GLES 3.1.
2018-07-16 14:39:59 -07:00
Eric Anholt beeb94402f v3d: Implement noperspective varyings on V3D 4.x.
Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.
2018-07-09 11:48:32 -07:00
Eric Anholt 93f437d128 v3d: Fix typo in dither mode offset.
We weren't using the field yet, so it didn't affect anything.

Fixes: c0476d964a ("v3d: Express dithering mode in the same way that the CLIF parser does.")
2018-07-09 11:48:32 -07:00
Eric Anholt 5601ab3981 v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.
Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one
2018-07-05 12:39:36 -07:00
Eric Anholt 7b63371420 v3d: Respect swap_color_rb for the f32_color_rb case.
We don't actually set the two flags together, but I want to use the
r/g/b/a reordered fields in the next commit.
2018-07-05 12:39:36 -07:00
Eric Anholt 49f7631c9f v3d: Emit a TF flush after each draw using TF.
This fixes GPU hangs on 7278 in transform feedback tests such as
GTF-GLES3.gtf.GL3Tests.transform_feedback2.transform_feedback2_basic
2018-07-02 10:05:14 -07:00
Eric Anholt a77cb724da v3d: Move GL shader state dumping out of per-version compilation.
It doesn't depend on V3D_VER, since it's just calling v3d_print_group.
2018-06-29 13:36:28 -07:00
Eric Anholt c2901ff80f v3d: Add missing Stream field to transform feedback specs on V3D 4.1.
Noticed when trying to CLIF parse a transform feedback job that hangs on
HW.
2018-06-29 13:36:28 -07:00
Eric Anholt 69efc1e025 v3d: Add missing "tri trip or fan" flag in Primitive List Format. 2018-06-29 13:36:28 -07:00
Eric Anholt b341b39db3 v3d: Fix the shader code address field widths on V3D 4.1+
We were overlapping it with the threadable/nan flags, resulting in
incorrect relocations (threadable/nan included in the offset) and wrong
ordering in the CLIF files.
2018-06-29 13:36:28 -07:00
Eric Anholt 6c3c11ba19 v3d: Add missing "no prim pack" field to the V3D4.1+ GL shader state.
It looks like we don't need this flag for anything (not that I'm clear on
what it does), but it makes our struct dumping line up with CLIF parsing.
2018-06-29 13:36:28 -07:00
Eric Anholt c0476d964a v3d: Express dithering mode in the same way that the CLIF parser does. 2018-06-29 13:36:28 -07:00
Eric Anholt 24d2f1347d v3d: Add missing "number of bin tile lists" field.
Noticed when trying to feed our dumps through the CLIF parser.  Since this
is a "minus one" field, we were already filling in the value we wanted (0).
2018-06-29 13:36:28 -07:00
Eric Anholt b65b61cefe v3d: Rewrite the color write masks to match CLIF format.
The render_target_* fields gave us pretty(ish) printing, but meant we were
incompatible with CLIF, and had much more verbose code generating them.
2018-06-29 13:36:28 -07:00
Eric Anholt 38172dcba9 v3d: Merge the V3D 4.1 and 4.2 XML into V3D 3.3'x XML.
The XML ends up noisier if you're only looking at one version, but from
the diffstat there's obvious wins in terms of deduplication.  This will
get even more significant if we ever support 3.2 or 4.0.
2018-06-29 13:36:28 -07:00
Eric Anholt 725561c0b6 v3d: Switch v3d_decoder.c to the XML's top min_ver/max_ver fields.
The XML zipper wants one XML per version for filling out its tables, but
we want to do more than one GPU version per XML now.  Assume that the
"gen" field will be the same as min_ver and look up our XML text assuming
that they're listed in increasing min_ver.
2018-06-29 13:36:28 -07:00
Eric Anholt f8af5c58c3 v3d: Create XML fields for min_ver and max_ver of a packet/struct/enum.
This will be used to merge together the V3D 3.3-4.1 XML with the variants
disabled based on the version.
2018-06-29 13:36:28 -07:00
Eric Anholt 6f7ad7ed11 v3d: Pass the version being generated to the pack generator script.
It turns out that most V3D versions change very few packets, so keeping
separate copies of the XML per version makes changing the XML a pain as
you have to replicate your changes to each one.  This is the start of
changing it so that one XML can generate headers for multiple versions.
2018-06-29 13:36:28 -07:00
Eric Anholt 9f80bcc2bc v3d: Convert a bunch of our "minus one" fields over to the new XML attr.
This fixes up their formatting for CLIF files and makes the code more
legible.
2018-06-27 09:13:48 -07:00
Eric Anholt 18b1bb0b63 v3d: Add pack/unpack/decode support for fields with a "- 1" modifier.
Right now, we name these fields as "field name minus one" so that your C
code obviously states what the value should be.  However, it's easy enough
to handle at the codegen level with another little XML attribute, meaning
less C code and easier-to-read values in CLIF dumping and gdb as well.

(The actual CLIF format for simulator and FPGA replay takes in
pre-minus-one values, so we need it there too).
2018-06-27 09:13:48 -07:00
Eric Anholt ee9a6a13fb v3d, vc4: Disable valgrind checking of CLE inputs when NDEBUG is set.
For a meson -Db_ndebug=true release build on x86_64, reduces text size of
libv3d.a from 53.0k to 51.6k.  Inspired by 0d5329d626 ("anv: Disable
__gen_validate_value if NDEBUG is set.")
2018-06-21 15:46:40 -07:00
Eric Anholt f49d112a01 v3d: Implement ALPHA_TO_COVERAGE.
There's a convenient "FTOC" instruction for generating the coverage now,
unlike vc4.  This fixes
dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
2018-06-20 09:30:46 -07:00
Eric Anholt 07b243674f v3d: Add missing always_flush debug flag.
The #define existed and was checked in the driver.
2018-06-19 09:42:20 -07:00
Eric Anholt 778594ae12 v3d: Limit shader threading according to our maximum TMU fifo usage.
Fixes simulator assertion failures in
dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment
and similar complicated cases.
2018-06-15 16:09:39 -07:00
Eric Anholt e130ada243 v3d: Fix shaders using pixel center W but no varyings.
The docs called this field "uses both center W and centroid W", but
actually it's "do you need center W even if varyings don't obviously call
for it?"

Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
2018-06-15 16:09:39 -07:00
Eric Anholt d91e06a065 v3d: Fix configuration setup of mixed f32 and f16 render targets.
Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
2018-06-14 16:52:25 -07:00
Eric Anholt 48011c42aa v3d: Remove unused QUNIFORM_STENCIL left over from vc4. 2018-06-14 16:52:25 -07:00
Eric Anholt a40bc33b11 v3d: Fix undefined results for a swap_color_rb RT from a float shader output.
Fixes segfaults and undefined behavior in
dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
2018-06-14 16:52:25 -07:00
Eric Anholt 9d5860310d v3d: Enable the new NIR bitfield operation lowering paths.
These together get the GLSL 3.00 unorm/snorm pack functions and
MESA_shader_integer operations working.

v2: Fix commit message typo.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt 2b1b2cbf61 v3d: Be more explicit about include directory from our generated code.
You'd need src/broadcom/cle/ in the -I previously, for srcdir != builddir.
nir was fine at that, but automake didn't have it.

Bugzilla: https://github.com/anholt/mesa/issues/104
2018-06-05 12:44:49 -07:00
Eric Anholt 97894b1267 v3d: Add support for glSampleMask / glSampleCoverage. 2018-05-17 15:09:46 +01:00
Eric Anholt 9bbc3f8cf1 v3d: Enable NaN propagation in the VS and CS as well.
Fixes piglit vs-isnan-*.shader_test at the expense of gl-1.0-spot-light.
2018-05-17 15:09:12 +01:00
Eric Anholt 8c47ebbd23 v3d: Rename the driver files from "vc5" to "v3d". 2018-05-16 21:19:07 +01:00
Eric Anholt c4c488a2ae v3d: Rename the vc5_dri.so driver to v3d_dri.so.
This allows the driver to load against the merged kernel DRM driver.  In
the process, rename most of the build system variables and gallium
plumbing functions.
2018-05-16 21:19:07 +01:00
jenny.q.cao ff7521c9ba android: change include "cutils/log.h" to "log/log.h" on Android API >=26
There is a compile warning from Android 8 (API version 26) from "include cutils/log.h"
warning: "Deprecated: don't include cutils/log.h, use either android/log.h or log/log.h"-W#warnings,
Change to include "log/log.h" on Android 8 or later major version to avoid this warning

Signed-off-by: jenny.q.cao <jenny.q.cao@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-05-14 08:08:31 +03:00
Eric Anholt 76ee9edcb4 broadcom/vc5: Add support for centroid varyings.
It would be nice to share the flags packet emit logic with flat shade
flags, but I couldn't come up with a good way while still using our pack
macros.  We need to refactor this to shader record setup at compile time,
anyway.

Fixes ext_framebuffer_multisample-interpolation * centroid-*
2018-04-26 11:30:22 -07:00
Eric Anholt 77b4f30bae broadcom/vc5: Add validation that we don't violate GFXH-1633 requirements.
We don't use ldunifa yet, but we will eventually for UBOs.
2018-04-26 11:30:22 -07:00
Eric Anholt 089c32eefd broadcom/vc5: Add validation that we don't violate GFXH-1625 requirements.
We don't use TMUWT yet, but we will once we do SSBOs.
2018-04-26 11:30:22 -07:00
Eric Anholt dc4cb04ee5 broadcom/vc5: Add QPU validation for register writes after thrend.
The next shader gets to start writing the register file during these
slots, so make sure we don't stomp over them.

The only case of hitting this that I could imagine would be dead writes.
2018-04-26 11:30:22 -07:00
Eric Anholt 503716fa86 broadcom/vc5: Remove leftover vc4 MSAA lowering setup in the FS key. 2018-04-25 09:21:54 -07:00
Eric Anholt 5710532e9e broadcom/vc5: Fix tile load/store of MSAA surfaces on 4.x.
For single-sample we have to always program SAMPLE_0, but for multisample
we want to store all the samples.
2018-04-25 09:21:54 -07:00
Ian Romanick d76c204d05 util: Move util_is_power_of_two to bitscan.h and rename to util_is_power_of_two_or_zero
The new name make the zero-input behavior more obvious.  The next
patch adds a new function with different zero-input behavior.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2018-03-29 14:09:23 -07:00
Aaron Watry 1dae92f150 broadcom/vc4: Fix out-of-tree build with automake.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-03-28 17:48:41 -07:00
Eric Anholt 81f82ecc56 broadcom/vc5: Start using nir_opt_move_load_ubo().
In the absence of a general NIR or VIR-level scheduler, this at least
avoids spilling in
GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts
2018-03-28 17:48:41 -07:00
Eric Anholt c2b13627d9 broadcom/vc5: Fix extraneous register index in QIR dumping of TLBU writes.
Just like TLB without a config uniform, we don't have a register index.
2018-03-26 17:46:23 -07:00
Eric Anholt d7a015cbc6 broadcom/vc5: Account for InstanceID/VertexID in VPM segment size.
Fixes failure in
GTF-GLES3.gtf.GL3Tests.draw_instanced.draw_instanced_attrib_size
2018-03-22 15:12:21 -07:00
Eric Anholt ba29b89dc7 broadcom/vc5: Set up a vertex position if the shader doesn't.
Our backend needs some sort of vertex position value to emit the scaled
viewport values and such.  Fixes potential segfaults in
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx
2018-03-22 15:12:21 -07:00
Eric Anholt baeb6a4b4a broadcom/vc5: Fix up the NIR types of FS outputs generated by NIR-to-TGSI.
Unfortunately TGSI doesn't record the type of the FS output like GLSL
does, but VC5's TLB writes depend on the output's base type.  Just record
the type in the key at variant compile time when we've got a TGSI input
and then fix it up.

Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a
GPU hang that breaks most tests that come after it.
2018-03-21 14:02:34 -07:00
Eric Anholt 00910e3057 broadcom/vc5: Don't annotate dumps with stale live intervals.
As you're debugging register allocation, you may have changed the
intervals and not recomputed yet.  Just skip the dump in that case.
2018-03-19 16:44:20 -07:00
Eric Anholt facc3c6f58 broadcom/vc5: Add support for register spilling.
Our register spilling support is nice to have since vc4 couldn't at all,
but we're still very restricted due to needing to not spill during a TMU
operation, or during the last segment of the program (which would be nice
to spill a value of, when there's a long-lived value being passed through
with little modification from the start to the end).

We could do better by emitting unspills for the last-segment values just
before the last thrsw, since the last segment is probably not the maximum
interference area.

Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3
others.
2018-03-19 16:44:06 -07:00
Eric Anholt 271fc58ba1 broadcom/vc5: Remove redundant last_inst lookup.
The point was to get the MOV, which the MOV_dest already returned.
2018-03-19 16:42:59 -07:00
Eric Anholt 34dc64f627 broadcom/vc5: On QPU pack error, dump the instruction and return cleanly.
This is nice for debugging when you've made a bad instruction.
2018-03-19 16:42:59 -07:00
Eric Anholt d721348dcd broadcom/vc5: Add cursors to the compiler infrastructure, like NIR's.
This will let me do lowering late in compilation using the same
instruction builder as we use in nir_to_vir.
2018-03-19 16:42:59 -07:00
Eric Anholt c81d681742 broadcom/vc5: Move the umul macro to a header.
Anywhere we want to multiply, we probably want this.
2018-03-19 16:42:59 -07:00
Eric Anholt 9e28c18cd1 broadcom/vc5: Correct the arg count of TIDX/EIDX. 2018-03-19 16:42:59 -07:00
Eric Anholt 55bf298333 broadcom/vc5: Re-do live variables after removing thrsws.
Otherwise our start/ends ips won't line up with the actual instructions.
2018-03-19 16:42:59 -07:00
Eric Anholt c3a504f470 broadcom/vc5: Add a QPU helper for instructions using the TLB.
This will be used for detecting last thread segment in register spilling.
2018-03-19 16:42:59 -07:00
Eric Anholt 09c4dd1971 broadcom/vc5: Introduce v3d_qpu_reads_vpm()/v3d_qpu_writes_vpm().
These helpers will be used in register spilling to determine where to add
a last thrsw if needed, and might help refactor QPU scheduling.
2018-03-19 16:42:59 -07:00
Eric Anholt 407f21ef1b broadcom/vc5: The ldvpm signal also a case of using the VPM.
The QPU scheduling code calling this function already separately checked
this signal.
2018-03-19 16:42:59 -07:00
Eric Anholt 4760040c09 broadcom/vc5: Extract v3d_qpu_writes_tmu() helper.
This will be reused in register spilling.
2018-03-19 16:42:59 -07:00
Timothy Arceri a050ea60ee nir: add lower_ldexp to nir compiler options
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-28 09:23:49 +11:00
Eric Anholt e29988c908 broadcom/vc5: Fix "hardwrae" typo in a field name in XML. 2018-02-05 13:53:38 +00:00
Eric Anholt 8bb000f460 broadcom/vc5: Try to merge more than 2 QPU instructions together.
Obviously it would be good to have an ADD and a MUL and a signal together,
but we can even potentially have multiple signals merged, as well.

total instructions in shared programs: 100423 -> 97874 (-2.54%)
instructions in affected programs:     78812 -> 76263 (-3.23%)
2018-02-05 09:29:37 +00:00
Eric Anholt dc78643ace broadcom/vc5: Remove no-op MOVs after register allocation.
We emit some MOVs to track lifetimes of payload registers, but we don't
need there to be actual MOV instructions for them.

total instructions in shared programs: 101045 -> 100423 (-0.62%)
instructions in affected programs:     37083 -> 36461 (-1.68%)
2018-02-05 09:29:37 +00:00
Eric Anholt f3978a7380 broadcom/vc5: Add missing shader-db instruction counting.
I must have misplaced it in the instruction packing rework.
2018-02-05 09:29:37 +00:00
Eric Anholt 353b42ccc7 broadcom/vc5: Fix a segfault on mix of booleans.
We don't have a src1 to look up if the compare instruction is "i2b".
2018-02-01 11:02:29 -08:00
Timothy Arceri 9a2e085680 nir: add lower_all_io_to_temps flag
This will be used for freedreno and vc4 which require all inputs
and outputs to be copied to temps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:08 +11:00
Eric Anholt 71c7e9bea1 broadcom/vc5: Enable CLIF dumping of V3D 4.2. 2018-01-27 19:04:21 +11:00
Eric Anholt 91f899cbc1 broadcom/vc5: Update the compiler for V3D 4.2. 2018-01-27 19:04:21 +11:00
Eric Anholt f2e41daac5 broadcom/vc5: Update QPU instruction pack/unpack for v4.2.
After the 4.1 spec, 4.2 retroactively renamed patchid to barrierid because
it's used for other barriers in compute.
2018-01-27 19:03:55 +11:00
Eric Anholt 96d3e8f134 broadcom/vc5: Add XML for V3D 4.2. 2018-01-27 18:57:58 +11:00
Eric Anholt b026063b16 broadcom/vc5: Fix a race between XML codegen build and CLIF build. 2018-01-27 18:57:58 +11:00
Eric Anholt de60ea4432 Android: Attempt to fix broadcom build after vc5 changes. 2018-01-27 18:03:58 +11:00
Dylan Baker 436ed65d38 autotools: include meson build files in tarball
This adds the meson.build, meson_options.txt, and a few scripts that are
used exclusively by the meson build.

v2: - Remove accidentally included changes needed to test make dist with
      LLVM > 3.9

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-01-19 16:30:51 -08:00
Emil Velikov 393cf04fa4 broadcom: add missing headers to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2018-01-18 11:21:35 +00:00
Eric Anholt 5bc0b63799 broadcom/vc5: Use MSF to ignore discards/non-dispatched channels in loops.
Prevents potential infinite loops when a non-dispatched or discarded
channel never triggers the loop break condition.
2018-01-12 21:58:24 -08:00
Eric Anholt 762dd52951 broadcom/vc5: Use XOR instead of SUB for execute flags comparisons.
I think this should be equivalent other than power, and it's the kind of
comparison we use for nir_op_ieq.
2018-01-12 21:58:18 -08:00
Eric Anholt 8e4cba9d92 broadcom/vc5: Also check the update flags for avoiding DCE.
I was trying to do a NULL-destination UF, and it got removed.
2018-01-12 21:58:11 -08:00
Eric Anholt aa77a9cf5a broadcom/vc5: Rename V3D 3.x Flat Shade Action to match v4.x naming.
Now that the actions are reused for centroid and nonperspective, give them
a more generic name.
2018-01-12 21:57:45 -08:00
Eric Anholt 368bab43fd broadcom/vc5: Add support for loading varyings in V3D 4.1.
The LDVARY signal now writes an arbitrary register, so I took out the
magic src register file and replaced it with an instruction with LDVARY
set so we have somewhere to hang a QFILE_TEMP destination for register
allocation.
2018-01-12 21:57:21 -08:00
Eric Anholt 5aaea3c4a0 broadcom/vc5: Add compiler support for V3D 4.x texturing. 2018-01-12 21:56:57 -08:00
Eric Anholt 028f6b327c broadcom/vc5: Add the new TMU write addresses for V3D 4.x (and r5rep).
The V3D 3.x series of TMU writes with meaning depending on the texture
type is replaced with writes to specific registers for each texture
argument semantic.
2018-01-12 21:56:48 -08:00
Eric Anholt 42a35da96d broadcom/vc5: Move V3D 3.3 texturing to a separate file.
V3D 4.x texturing changes enough that #ifdefs would just make a mess of
it.
2018-01-12 21:56:37 -08:00
Eric Anholt acf30e4916 broadcom/vc5: Move V3D 3.3 VPM write setup to a separate file.
For V4.1 texturing, I need the V4.1 XML, so the main compiler needs to
stop including V3.3 XML.
2018-01-12 21:56:24 -08:00
Eric Anholt 34898c8c45 broadcom/vc5: Add support for V3D 4.1 CLIF dumping. 2018-01-12 21:55:49 -08:00
Eric Anholt 409696b76e broadcom/vc5: Move the body of CLIF dumping to a per-version file.
I want the library's entrypoints to still be unversioned, but the actual
packet dumping needs to be per-version.
2018-01-12 21:55:38 -08:00
Eric Anholt 90269ba353 broadcom/vc5: Use THRSW to enable multi-threaded shaders.
This is a major performance boost on all of V3D, but is required on V3D
4.x where shaders are always either 2- or 4-threaded.
2018-01-12 21:55:30 -08:00
Eric Anholt 86a12b4d5a broadcom/vc5: Properly schedule the thread-end THRSW.
This fills in the delay slots of thread end as much as we can (other than
being cautious about potential TLBZ writes).

In the process, I moved the thread end THRSW instruction creation to the
scheduler.  Once we start emitting THRSWs in the shader, we need to
schedule the thread-end one differently from other THRSWs, so having it in
there makes that easy.
2018-01-12 21:55:23 -08:00
Eric Anholt a075bb6726 broadcom/vc5: Implement GFXH-1684 workaround.
Apparently the VPM writes need to be flushed out before we end the shader.
2018-01-12 21:55:15 -08:00
Eric Anholt f50d39ab49 broadcom/vc5: Add a test for .ifb in ADD ops.
I had a .ifb being decoded weird in sampid, so this is to check that .ifb
is fine.
2018-01-12 21:54:57 -08:00
Eric Anholt 267f13dbee broadcom/vc5: Add the new tesselation opcodes in V3D 4.1. 2018-01-12 21:54:50 -08:00
Eric Anholt edbd817c30 broadcom/vc5: Use a physical-reg-only register class for LDVPM.
This is needed for LDVPM on V3D 4.x, but will also be needed for keeping
values out of the accumulators across THRSW.
2018-01-12 21:54:42 -08:00
Eric Anholt 22a02f3e34 broadcom/vc5: Use the new LDVPM/STVPM opcodes on V3D 4.1.
Now, instead of a magic write register for VPM stores we have an
instruction to do them (which means no packing of other ALU ops into it),
with the ability to reorder the VPM stores due to the offset being baked
into the instruction.

VPM loads also gain the ability to be reordered by packing the row into
the A argument.  They also no longer write to the r3 accumulator, and
instead must be stored to a physical register.
2018-01-12 21:54:33 -08:00
Eric Anholt 55f8a01aca broadcom/vc5: Drop dead VC5_QPU_* defines from qpu_instr.c.
I had all the packing code in this file at one point, but these defines
now live in qpu_pack.c.
2018-01-12 21:54:27 -08:00
Eric Anholt 2bd378647b broadcom/vc5: Add support for QPU pack/unpack/disasm of small immediates. 2018-01-12 21:54:18 -08:00
Eric Anholt c81cc767e4 broadcom/vc5: Drop signal bit #defines.
Signals are more complicated than that, and tables ended up being better.
2018-01-12 21:53:53 -08:00
Eric Anholt dfee62eed3 broadcom/vc5: Add support for V3Dv4 signal bits.
The WRTMUC replaces the implicit uniform loads in the first two texture
instructions.  LDVPM disappears in favor of an ALU op.  LDVARY, LDTMU,
LDTLB, and LDUNIF*RF now write to arbitrary registers, which required
passing the devinfo through to a few more functions.
2018-01-12 21:53:45 -08:00
Eric Anholt 81ec2ba229 broadcom/vc5: Fix pack/unpack of vfmul input unpack flags. 2018-01-12 21:53:38 -08:00
Eric Anholt fb4face86a broadcom/vc5: Introduce v3dx_macros.h and v3dx_pack.h headers.
This will be used by vc5 for prefixing functions and including the pack
header in v3d-version-dependent code, following the model of anv.
2018-01-12 21:51:40 -08:00
Eric Anholt 7dedfd9660 broadcom/cle: Fix error path of missing a "type" in the XML.
We try to emit a #error and continue so that you can debug the missing
type at C compile time, but were missing a couple of definitions in that
path (sigh, python).
2018-01-12 21:51:34 -08:00
Eric Anholt 3d8ad50370 broadcom/vc5: Add XML for V3D v4.1 (BCM7278) 2018-01-12 21:48:07 -08:00
Dylan Baker 2083a14179 meson: Use dependencies for nir
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.

This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Dylan Baker 60856a7b49 meson: don't use intermediate variables that are immediately discarded
For things like:
loop
    x = func()
    list += x
end

just do:
loop
    list += func()
end

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Dylan Baker 4ccb981673 meson: Use consistent style for tests
Don't use intermediate variables, use consistent whitespace.

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Dylan Baker fbf192a67e meson: Use consistent style
Currently the meosn build has a mix of two styles:
arg : [foo, ...
       bar],

and
arg : [
  foo, ...,
  bar,
]

For consistency let's pick one. I've picked the later style, which I
think is more readable, and is more common in the mesa code base.

v2: - fix commit message

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Eric Anholt e60e3a56a2 broadcom/vc5: Fix discard_if during control flow.
I want to do the SETMSF.IFA to discard only if execute == 0 and cond, so
our dest of the PUSHZ needs to be nonzero if execute or !cond are nonzero.

Fixes dEQP-GLES3.functional.shaders.discard.dynamic_loop_dynamic.
2018-01-03 14:31:36 -08:00
Eric Anholt 635131a238 broadcom/vc5: Don't emit component 3/4 F16 TLB writes for float/vec2.
Fixes a simulator assertion failure on
dEQP-GLES3.functional.fragment_out.array.fixed.r8_highp_float.
2018-01-03 14:31:28 -08:00
Eric Anholt 39811a2894 broadcom/vc5: Introduce enums for internal depth/type, with V3D prefixes. 2018-01-03 14:25:23 -08:00
Eric Anholt d3e8a4b96c broadcom/xml: Fix up safe name confusion with prefixing.
For enums we were doubling the underscore if the value had a numeric first
character of its name (which safe_name() adds an underscore to).  A little
helper function cleans up the other instance of prefixing while also
fixing this.
2018-01-03 14:25:23 -08:00
Eric Anholt 48cabc1e75 broadcom/vc5: Turn the decimate mode field into an enum in the XML. 2018-01-03 14:25:23 -08:00
Eric Anholt 17cb634b1c broadcom/vc5: Turn the output image format into an enum. 2018-01-03 14:25:23 -08:00
Eric Anholt 883a9b02c9 broadcom/vc5: Turn the CLE XML's memory format into an enum. 2018-01-03 14:25:23 -08:00
Eric Anholt 8e5a0ed953 broadcom/vc5: Emit flat shade flags for varying components > 24.
This means that with no flatshading we'll emit the single-byte
ZERO_ALL_FLAT_SHADE_FLAGS, and otherwise emit a set of FLAT_SHADE_FLAGS to
get all the bits we need set.

There's a _SET enum in the packet we could use to possibly set entire
ranges of the bitfield without using another packet, but this at least
fixes the conformance failure.
2018-01-03 14:25:23 -08:00
Eric Anholt 2056e4a777 broadcom/vc5: Emit proper flatshading code for glShadeModel(GL_FLAT).
In updating the simulator, behavior changed slightly so that our old code
wasn't getting glxgears's flatshading interpolated right.  Emit flat
shading code just like we would for a normal flat-shaded varying, by
passing a flag in the shader key for glShadeModel(GL_FLAT) state and
customizing the color inputs based on that.
2018-01-03 14:25:23 -08:00
Eric Anholt 4764699552 braodcom/vc5: Rely on OVRTMUOUT always being set.
It seems that the HW team has decided that it's the only supported mode,
and it's the mode I actually meant to be using but forgot.  Our table of
return_32_bit should have matched the default non-OVRTMUOUT behavior, so
this change should be invisible.

However, the change revealed that some my return_size checks for swizzling
were a bit confused in the shadow case, so I had to move them to draw time
once we have both the sampler and the view together.

Fixes assertion failures in the updated simulator, where the non-OVRTMUOUT
support has been removed.
2018-01-03 14:25:23 -08:00
Eric Anholt ba965084b6 broadcom/vc5: Move texture return channel setup into the compiler.
The compiler decides how many LDTMUs we're going to emit, and that must
match the P1 flags.  This brings the return channel counting to a single
place (so all that's passed into the compiler is "how many return channels
you may request from this texture's format), and was a necessary step for
shadow samplers once we stop using OVRTMUOUT=0.
2018-01-03 14:25:23 -08:00
Eric Anholt 22ceb1f99b broadcom/vc5: Add missing setting of the UIF XOR disable flag in textures.
Most piglit textures happened to work out by RGBW not changing in that
bit, but it did cause failures in RGBA16F fbo-generatemipmap-formats.
2017-12-19 15:55:14 -08:00
Eric Anholt 49e2586bfc broadcom/vc5: Fix a typo in memcmp for sig unpack checking.
This shockingly ended up working out, because only the first byte of *sig
is used and (sizeof(*sig) != 0) == 1.  Fixes a compiler warning.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=104183
2017-12-14 14:36:24 -08:00
Eric Anholt 1171f1749d broadcom/vc5: Enable NIR txd lowering on all txd instructions.
Fixes almost all of piglit's arb_shader_texture_lod grad tests, except for
the base -texgrad/texgradcube ones which fail on what appear to be
precision problems.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-14 14:36:17 -08:00
Eric Anholt 52f024b052 broadcom/vc5: Fix shader input/outputs for gallium's new NIR linking. 2017-12-14 14:36:17 -08:00
Eric Engestrom 4cba39331d meson: add dep_thread to every lib that includes threads.h
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-12-07 17:29:42 +00:00
Eric Anholt fefff74b0d broadcom/vc4: Use the new enum functionality of the XML to decode better. 2017-12-01 15:37:28 -08:00
Eric Engestrom bb46111c01 broadcom: use NDEBUG to guard asserts
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-11-28 09:50:36 +00:00
Eric Anholt 6a78416dab broadcom/vc5: Fix BASE_LEVEL handling with txl.
The HW doesn't add the base level anywhere (the min/max lod clamping is
what does base level), so we need to add it manually in this case.

Fixes piglit tex-miplevel-selection *Lod 2D.
2017-11-22 10:56:31 -08:00
Eric Anholt 514db90448 broadcom/vc5: Fix up integer texture handling.
The original spec I had didn't expose integer textures and suggested that
you use unfiltered floats.  Now there are proper formats for them.

Fixes 16- and 32-bit texwrap integer tests in piglit, and
dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0.rgb10_a2ui.
2017-11-19 10:12:30 -08:00
Eric Anholt 87391e23cf broadcom/vc5: Ensure that there is always a TLB write.
This should fix some GPU hangs in our (currently always single-threaded)
fragment shaders, and definitely fixes assertion failures in simulation.
2017-11-17 16:09:55 -08:00
Andreas Boll 4f29ed38f3 broadcom/vc5: Remove unused v3d_compiler.c
Unused since original import of VC5.

Fixes: ade416d023 ("broadcom: Add VC5 NIR compiler.")

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-11-08 18:30:47 +00:00
Eric Anholt 50906e4583 broadcom/vc5: Do 16-bit unpacking of integer texture returns properly.
We were doing f16 unpacks, which trashed "1" values.  Fixes many piglit
texwrap GL_EXT_texture_integer cases.
2017-11-07 12:58:03 -08:00
Eric Anholt dfff9ce45e broadcom/vc5: Fix scheduling for a non-SFU R4 write after a dead R4 write.
The v3d_qpu_writes_r*() were only checking for fixed-function accumulator
writes, not normal ALU writes to those regs.

Fixes fs-discard-exit-2 on simulation (but not HW).
2017-11-07 12:57:49 -08:00
Eric Anholt 4f33344e7a broadcom/vc5: Add occlusion query support.
Fixes all of piglit's OQ tests.
2017-11-07 12:56:40 -08:00
Eric Anholt a266f78741 broadcom/vc5: Fix mipmap filtering enums.
The ordering of the values was even less obvious than I thought, with both
the mip filter and the min filter being in different bits depending on
whether the mip filter is none.

Fixes piglit fs-textureLod-miplevels.shader_test
2017-11-07 09:40:25 -08:00
Eric Anholt dd429cb2db broadcom/vc5: Fix missing enum decode for indexed primitives. 2017-11-07 09:19:48 -08:00
Eric Anholt bb6997e6a3 broadcom/vc5: Drop padding bits from the bottom of the TSDA address.
Fixes misaligned-looking addresses in decode.
2017-11-07 09:19:48 -08:00