Commit Graph

272 Commits

Author SHA1 Message Date
Mathias Fröhlich 56088131d0 gallium: introduce PIPE_CAP_CLIP_HALFZ.
In preparation of ARB_clip_control. Let the driver decide if
it supports pipe_rasterizer_state::clip_halfz being set to true.

v3:
Initially enable on ilo.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de
2014-10-24 19:21:21 +02:00
Eric Anholt 8c7ac377b7 vc4: Reuse uniform_data/contents indices when making uniforms.
This allows vc4_opt_cse.c to CSE-away operations involving the same
uniform values.

total instructions in shared programs: 37341 -> 36906 (-1.16%)
instructions in affected programs:     10233 -> 9798 (-4.25%)
total uniforms in shared programs: 10523 -> 10320 (-1.93%)
uniforms in affected programs:     2467 -> 2264 (-8.23%)
2014-10-24 18:04:26 +01:00
Eric Anholt 18ccda7b86 vc4: When asked to discard-map a whole resource, discard it.
This saves a bunch of extra flushes when texsubimaging a whole texture
that's been used for rendering, or subdataing a whole BO.  In particular,
this massively reduces the runtime of piglit texture-packed-formats (when
the probes have been moved out of the inner loop).
2014-10-24 18:04:26 +01:00
Eric Anholt a71c3b885a vc4: Refactor flushing before mapping a BO.
I'm going to want to make some other decisions here before flushing.
2014-10-24 18:04:26 +01:00
Eric Anholt 52824811b9 vc4: Allow dead code elimination of unused varyings.
total instructions in shared programs: 39022 -> 37341 (-4.31%)
instructions in affected programs:     26979 -> 25298 (-6.23%)
total uniforms in shared programs: 11242 -> 10523 (-6.40%)
uniforms in affected programs:     5836 -> 5117 (-12.32%)
2014-10-24 18:04:26 +01:00
Eric Anholt 5d32e26335 vc4: Add debug output to match shaderdb info to program dumps.
I'm going to be using VC4_DEBUG=shaderdb,norast to do shaderdb stats, but
when debugging regressions, I want to match shaderdb output to shader
disassembly.
2014-10-24 18:04:26 +01:00
Marek Olšák 5f5b83cbba gallium: add PIPE_SHADER_CAP_MAX_OUTPUTS and use it in st/mesa
With 5 shader stages and various combinations of enabled and disabled shaders,
the maximum number of outputs in one shader doesn't have to be equal to
the maximum number of inputs in the following shader.

v2: return 32 for softpipe and llvmpipe
2014-10-21 21:59:02 +02:00
Eric Anholt ef280c95f2 vc4: Fix SRC_ALPHA_SATURATE blending.
Fixes glean blendFunc.
2014-10-21 15:46:48 +01:00
Eric Anholt cc298023c9 vc4: Fix stencil writemask handling.
If the writemask doesn't compress, then we want to put in the uncompressed
writemask, not the compressed writemask failure value (all-on).

Fixes glean's stencil2 and fbo-clear-formats on stencil.
2014-10-21 15:16:41 +01:00
Eric Anholt 48f6351940 vc4: Don't look at back stencil state unless two-sided stencil is enabled.
Fixes regressions in the next bugfix, because gallium util stuff leaves
the back stencil state as 0 if !back->enabled.
2014-10-21 15:16:41 +01:00
Eric Anholt 6212d2402d vc4: Translate 4-byte index buffers to 2 bytes.
Fixes assertion failures in 14 piglit tests (half of which now pass).
2014-10-19 08:44:56 +01:00
Eric Anholt 572fba95e4 vc4: Add support for rebasing texture levels so firstlevel == 0.
GLES2 doesn't have GL_TEXTURE_BASE_LEVEL, so the hardware doesn't.  Fixes
piglit levelclamp, tex-miplevel-selection, and texture-storage/2D mipmap
rendering.
2014-10-19 08:42:33 +01:00
Eric Anholt 15eb4c59f6 vc4: Apply a Newton-Raphson step to improve RSQ
Fixes all the piglit built-in-functions/*sqrt tests, among others.
2014-10-18 10:08:59 +01:00
Eric Anholt 1fc124b80f vc4: Apply a Newton-Raphson step to improve RCP.
Fixes all the piglit floating-point *-op-div tests, among others.
2014-10-18 10:08:59 +01:00
Eric Anholt 0fdc5111b4 vc4: Add a little bit more packet parsing to make dump reading easier.
Probably should have done this *before* staring at all those render lists
today.
2014-10-18 10:08:59 +01:00
Eric Anholt 9ebfb3014e vc4: Make some assertions about how many flushes/EOFs the simulator sees.
This caught the previous commit's bug in the kernel validator.
2014-10-17 13:13:43 +01:00
Eric Anholt 1f7048419e vc4: Fix accidental dropping of the low bits of the store tilebuffer packet.
Notably this included the EOF flag (the other bits are the full buffer
dump selection, but we don't do full dumps), which caused the kernel
checking for frame completion to trigger.
2014-10-17 13:09:29 +01:00
Eric Anholt afc3aa373d vc4: Set the primitive list format at the start of rendering.
The other driver does this manually before calling into each tile, but we
can just let it get binned into the tiles (saving repeated kernel
validation on the packet).

Fixes simulator assertion failures on polygon-mode and non-auto texwrap.
2014-10-17 13:09:28 +01:00
Eric Anholt 895c904103 vc4: Replace the FLUSH_ALL with FLUSH.
We don't need to emit all of our current state at the end of each bin
list.  We're going to be smashing it all at the start of the next tile's
bin list, anyway.
2014-10-17 13:09:28 +01:00
Eric Anholt 000976ed99 vc4: Add some comments about state management. 2014-10-17 13:09:28 +01:00
Eric Anholt 135287db17 vc4: Make sure there's exactly 1 tile store per tile coords packet.
It's not documented that I can see, but the other driver does it (check
vg_hw_4.c), and one of the HW guys confirmed that you really do need to do
it.
2014-10-17 13:09:25 +01:00
Emil Velikov 79d09a4b12 vc4: correctly include the source files
The kernel files are built into a separate static library and
all the functions that require it are already wrapped in ifdef
USE_VC4_SIMULATOR. Don't forget the header file :)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-10-16 10:00:14 +01:00
Eric Anholt 57de9bbb63 vc4: Fix the uniform debug output.
I dropped the shader index when moving to the compiled shader struct, but
didn't update the format string here.
2014-10-15 18:12:03 +01:00
Eric Anholt 201d4c0b2a vc4: Add support for user clip plane and gl_ClipVertex.
Fixes about 15 piglit tests about interpolation and clipping.
2014-10-15 18:11:46 +01:00
Eric Anholt 6a0bf67048 vc4: Move the output semantics setup to a helper.
I want to reuse it elsewhere to set up outputs that aren't in the TGSI.
2014-10-15 18:11:46 +01:00
Eric Anholt a2d8b6dbd5 vc4: Fix render target NPOT alignment at small miplevels.
The texturing hardware takes the POT level 0 width/height and minifies
those.  This is different from what we were doing, for example, for
273-wide's level 5: POT(273>>5) == 8, while POT(273)>>5 == 16.

Fixes piglit-depthstencil-render-miplevels 273.
2014-10-14 14:57:50 +01:00
Eric Anholt b5fc9d5664 vc4: Add support for having 0 vertex elements used.
You have to load at least 1, according to the simulator.  Fixes 4 piglit
tests and even more ES2 conformance tests.
2014-10-14 11:29:48 +01:00
Eric Anholt 615bbf0ca6 vc4: Write the VPM read setup multiple times to queue all the inputs.
There's a 4-element fifo, and the size (number of dwords per vertex) field
is just 4 bits.

Fixes glsl-routing on sim.
2014-10-13 17:16:05 +01:00
Eric Anholt e1d1c39626 vc4: Add support for the TXL opcode.
There's a bit at the bottom of cube map stride (which has some formatting
bugs in the docs) which flips the bias coordinate to being an absolute
LOD.
2014-10-13 17:15:47 +01:00
Eric Anholt 5bc91b6e32 vc4: Improve the accuracy of SIN and COS.
This gets them to pass glsl-sin/cos.  There was an obvious problem that I
was using the FRC code on the scaled input value, which means that we had
a range in [0, 1], while our taylor is most accurate across [-0.5, 0.5].
We can just slide things over, but that means flipping the sign of the
coefficients.  After that, it was just a matter of stuffing more
coefficients in.
2014-10-13 17:15:47 +01:00
Eric Anholt 5d72a1c956 vc4: Match VS outputs to FS inputs.
If the VS doesn't output a value that the FS needs, we still need to read
the right contents for the remaining FS inputs, by emitting padding.  And
if the VS outputs something the FS doesn't need, we shouldn't put it in
the VPM at all (so the code producing it can get DCEed).

Fixes 77 piglit tests.
2014-10-13 13:23:48 +01:00
Eric Anholt 83365a5b57 vc4: Add support for the CEIL opcode.
Not as big of a deal as SSG, but still +9 piglit tests.
2014-10-13 08:06:48 +01:00
Eric Anholt 926eaa9af4 vc4: Add support for the SSG opcode. 2014-10-13 08:06:48 +01:00
Eric Anholt 070b2c2efc vc4: Use the fnv1 hash function instead of gallium util's crc32.
Improves simulated norast performance on a little benchmark by 13.4012%
+/- 2.08459% (n=13).
2014-10-10 15:49:34 +02:00
Eric Anholt d09509da2a vc4: Don't look up the compiled shaders unless state has changed.
Improves simulated norast performance on a little benchmark by 38.0965%
+/- 3.27534% (n=11).
2014-10-10 15:49:22 +02:00
Eric Anholt c6f50c4086 vc4: Actually clear the context's dirty flags.
I was trying to skip state updates when !dirty, and suspiciously
everything was always dirty.
2014-10-10 15:03:13 +02:00
Eric Anholt 7c474f9f2e vc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a).
Cleans up some output to be more obvious in a piglit test I'm looking at.
2014-10-10 15:03:12 +02:00
Eric Anholt 7e67ea994c vc4: Optimize out adds of 0. 2014-10-09 21:47:06 +02:00
Eric Anholt 0401f55fff vc4: Optimize fmul(x, 0) and fmul(x, 1).
This was being generated frequently by matrix multiplies of 2 and
3-channel vertex attributes (which have the 0 or 1 loaded in the shader).
2014-10-09 21:47:06 +02:00
Eric Anholt 1cd8c1aab0 vc4: Factor out the turn-it-into-a-mov in opt_algebraic.
This will be used more in the next commits.
2014-10-09 21:47:06 +02:00
Eric Anholt 40748cf8d9 vc4: Eliminate unused texture instructions. 2014-10-09 21:47:06 +02:00
Eric Anholt b73cab6826 vc4: Dead code eliminate unused SF instructions. 2014-10-09 21:47:06 +02:00
Eric Anholt 93cac2637b vc4: Prevent copy propagating out the MOVs from r4.
Copy propagating these might result in reading the r4 after some other
instruction has written r4.  Just prevent all copy propagation of this for
now.

Fixes bad rendering with upcoming indirect register access support, where
the copy propagation was consistently happening across another read.
2014-10-09 21:47:06 +02:00
Eric Anholt c4b0dd5356 vc4: Split the coordinate shader to its own vc4_compiled_shader.
Merging VS and CS into the same struct wasn't winning us anything except
for not allocating a separate BO (but if we want to pack programs into
BOs, we should pack not just those 2 programs together).  What it was
getting us was a bunch of code duplication about hash table lookups and
propagating vc4_compile contents into a vc4_compiled_shader.

I was about to make the situation worse with indirect uniform buffer
access.
2014-10-09 21:47:06 +02:00
Eric Anholt 5c72d7706c vc4: Add #defines for the texture uniform fields.
I wanted to make another set of texture uploads for handling reladdr
constants, and duplicating all the bitshifting looked like a terrible
idea.  In the process, this fixes a swap of the s/t texture wrap modes.
2014-10-09 21:47:06 +02:00
Eric Anholt 5cfab07639 vc4: Initialize undefined temporaries to 0.
Under the simulator, reading registers before writing them triggers an
assertion failure.  c->undef gets treated as r0, which will usually be
written, but not if it's used in the first instruction.  We should
definitely not be aborting in this case, and return some sort of undefined
value instead.

Fixes glsl-user-varying-ff.
2014-10-09 21:47:06 +02:00
Eric Anholt 5a13522898 vc4: Optimize SF(ITOF(x)) -> SF(x).
This is a common production of st_glsl_to_tgsi, because CMP takes a float
argument.
2014-10-09 11:01:18 +02:00
Eric Anholt 00a9aebfe0 vc4: Add some optimization of FADD(FSUB(0, x)).
This is a common production of st_glsl_to_tgsi, which uses negate flags on
source arguments to handle subtraction.
2014-10-09 11:01:18 +02:00
Eric Anholt 67aea92964 vc4: Mostly fix offset calculation for NPOT mipmap levels.
The non-base NPOT levels are stored as POT-aligned images.  We get that
POT alignment by minifying the POT-aligned base level.

This means that level strides are also POT aligned, so we have to tell the
rendering mode config that our resource is larger than the actual
requested area.

Fixes the fbo-generatemipmap-formats NPOT cases.  Regresses
depthstencil-render-miplevels 273 * -- the texture presentation now works
(where it was completely broken before), it looks like there's some
overflow of image bounds happening at the lower miplevels.
2014-10-09 11:01:09 +02:00
Eric Anholt 0b96a086cb vc4: Move the mirrored kernel code to a kernel/ directory.
Now this whole setup matches the kernel's file layout much more closely.
2014-10-09 09:46:39 +02:00