Commit Graph

65775 Commits

Author SHA1 Message Date
Jason Ekstrand d25aaf1cb1 i965/fs: Use the GRF for UNTYPED_ATOMIC instructions
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:15 -07:00
Jason Ekstrand 65ddf6f404 i965/fs: Add a function for getting a component of a 8 or 16-wide register
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:15 -07:00
Jason Ekstrand 30d718c2fb i965/fs: Use the instruction execution size directly for texture generation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:15 -07:00
Jason Ekstrand 48ddd2889e i965/fs: Use exec_size instead of force_uncompressed in dump_instruction
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:15 -07:00
Jason Ekstrand b18fd234da i965/fs: Use instruction execution sizes instead of heuristics
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:15 -07:00
Jason Ekstrand 894ec5a1d8 i965/fs: Use instruction execution sizes to set compression state
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 8f1adb5965 i965/fs: Remove unneeded uses of force_uncompressed
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 2999f83bd9 i965/fs: Derive force_uncompressed from instruction exec_size
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 5f41d052bf i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*
Now that we have execution sizes, we can use that instead of the
   dispatch width.  This way it also works for 8-wide instructions in
   SIMD16.

i965/fs: Make effective_width a variable instead of a function

i965/fs: Preserve effective width in constant propagation

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 6ba31cc000 i965/fs: Better guess the width of LOAD_PAYLOAD
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 071ac3a467 i965/fs: Add an exec_size field to fs_inst
This will, eventually, allow us to manage execution sizes of
   instructions in a much more natural way from the fs_visitor level.

i965/fs: Explicitly set instruction execute size a couple of places

i965/blorp: Explicitly set instruction execute sizes

   Since blorp is all 16-wide and nothing isn't, in general, very careful
   about register width, we'll just set it all explicitly.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand fbc0a798ee i965/fs: Determine partial writes based on the destination width
Now that we track both halves of a 16-wide vgrf, we no longer need to worry
about force_sechalf or force_uncompressed.  The only real issue is if the
destination is too small.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 27d7ef094a i965/fs: Fix a bug in register coalesce
This commit fixes a bug in register coalesce that happens when one register
is moved to another the proper number of times but the channels are
re-arranged.  When this happens, the previous code would happily coalesce
the registers regardless of the fact that the channel mappins were wrong.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 16819b48ab i965/fs: Rework GEN5 texturing code to use fs_reg and offset()
Now that offset() can properly handle MRF registers, we can use an MRF
fs_reg and let offset() handle incrementing it correctly for different
dispatch widths.  While this doesn't have any noticeable effect currently,
it does ensure that the destination register is 16-wide which will be
necessary later when we start detecting execution sizes based on source and
destination registers.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 7210583eb8 i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode
This is actually the squash of a bunch of different changes.  Individual
commit titles follow:

i965/fs: Always 2-align registers SIMD16 for gen <= 5

i965/fs: Use the register width when applying offsets

   This reworks both byte_offset() and offset() to be more intelligent.
   The byte_offset() function now supports offsets bigger than 32. The
   offset() function uses the byte_offset() function together with the
   register width and the type size to offset the register by the correct
   amount.

i965/fs: Change regs_read to be in hardware registers

i965/fs: Change regs_written to be actual hardware registers

i965/fs: Properly handle register widths in LOAD_PAYLOAD

   The LOAD_PAYLOAD instruction is a bit special because it collects a
   bunch of registers (with possibly different widths) into a single
   payload block.  Once the payload is constructed, it's treated as a
   single block of data and most of the information such as register widths
   doesn't matter anymore.  In particular, the offset of any particular
   source register is the accumulation of the sizes of the previous source
   registers.

i965/fs: Properly set writemasks in LOAD_PAYLOAD

i965/fs: Handle register widths in demote_pull_constants

i965/fs: Get rid of implicit register doubling in the allocator

i965/fs: Reserve enough registers for PLN instructions

i965/fs: Make sources and destinations interfere in 16-wide

i965/fs: Properly handle register widths in CSE

i965/fs: Properly handle register widths in register_coalesce

i965/fs: Properly handle widths in copy propagation

i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD

i965/fs: Properly handle register widths and odd register sizes in spilling

i965/fs: Don't waste a register on texture lookups for gen >= 7

   Previously, we were waisting a register in SIMD16 mode because we could
   only allocate registers in pairs.  Now that we can allocate and address
   odd-sized registers, let's get rid of this special-case.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 4232a776a6 i965/fs: Handle printing of registers better.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 5390ca8ce9 i965: Explicitly set widths on gen5 math instruction destinations.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 004fbd5375 i965/fs: Make half() divide the register width by 2 and use it more
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 24d023b9fe i965/fs: Add a concept of a width to fs_reg
Every register in i965 assembly implicitly has a concept of a "width".
Usually, this is derived from the execution size of the instruction.
However, when writing a compiler it turns out that it is frequently a
useful to have the width explicitly in the register and derive the
execution size of the instruction from the widths of the registers used in
it.

This commit adds a width field to fs_reg along with an effective_width()
helper function.  The effective_width() function tells you how wide the
register effectively is when used in an instruction.  For example, uniform
values have width 1 since the data is not actually repeated, but when used
in an instruction they take on the width of the instruction.  However, for
some instructions (LOAD_PAYLOAD being the notable exception), the width is
not the same.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 1030ee6e9b i965/fs: A little harmless refactoring of register_coalesce
Just pass the visitor into is_copy_payload() and is_coalesce_candidate()
instead of a register size and the virtual_grf_sizes array.  Among other
things, this makes the code more obvious because you don't have to figure
out where src_size came from.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand f91b566f55 i965/brw_reg: Add a firsthalf function and use it in the generator
Right now, this function is a no-op but it indicates that we intend to only
use the first half of the 16-wide register.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 1728e74957 i965/fs: Copy propagate partial reads.
This commit reworks copy propagation a bit to support propagating the
copying of partial registers.  This comes up every time we have pull
constants because we do a pull constant read immediately followed by a move
to splat the one component of the out to 8 or 16-wide.  This allows us to
eliminate the copy and simply use the one component of the register.

Shader DB results:

total instructions in shared programs: 5044937 -> 5044428 (-0.01%)
instructions in affected programs:     66112 -> 65603 (-0.77%)
GAINED:                                0
LOST:                                  0

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 4d5f0eb048 i965/fs: Refactor fs_inst::is_send_from_grf()
A switch statement is much easier to read/edit than a big giant or
statement.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 54688cd03b i965/fs: Clean up emit_fb_writes
This splits emit_fb_writes into two functions: emit_fb_writes and
emit_single_fb_write.  This reduces the amount of duplicated code in
emit_fb_writes and makes the register number fiddling less arcane.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 72a3780f26 i965/fs: Print BAD_FILE registers in dump_instruction
Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be
able to see them.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:14 -07:00
Jason Ekstrand 2af4b0aeaf i965/fs: Make compact_virtual_grfs an optimization pass
Previously we disabled compact_virtual_grfs when dumping optimizations.
The idea here was to make it easier to diff the dumped shader because you
didn't have a sudden renaming.  However, sometimes a bug is affected by
compact_virtual_grfs and, when this happens, you want to keep dumping
instructions with compact_virtual_grfs enabled.  By turning it into an
optimization pass and dumping it along with the others, we retain the
ability to diff because you can just diff against the compact_virtual_grf
output.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand a25db10c12 i964/fs: Make immediate fs_reg constructors explicit
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand 1c89e098e8 i965/fs: Make null_reg_* const members of fs_visitor instead of globals
We also set the register width equal to the dispatch width.  Right now,
this is effectively a no-op since we don't do anything with it.  However,
it will be important once we add an actual width field to fs_reg.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand ab7234c852 i965/fs: Use the var_from_vgrf helper function instead of doing it manually
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand c24dd54f97 i965/fs: Fix a bug with dead_code_eliminate on large writes
Previously, if an instruction wrote to more than one register, we
implicitly assumed that it filled the entire register.  We never hit this
before because the only time we did multi-register writes was things like
texturing which always wrote to all of the registers.  However, with the
upcoming ability to do 16-wide instructions in SIMD8 and things of that
nature, we can have multi-register writes at offsets and we'll hit this.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand 1385a4b706 i965/fs: Use the UW type for the destination of VARYING_PULL_CONSTANT_LOAD instructions
Using a floating-point type doesn't usually cause hangs on my HSW, but the
simulator complains about it quite a bit.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand f0d43c09b2 i965/fs: Use offset a lot more places
We have this wonderful offset() function for advancing registers, but we're
not using it.  Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset.  In a few commits, we will make
offset do even more nifty things for us.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand 0089d025aa i965/fs: fix a comment in compact_virtual_grfs
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand 3dc3fccb75 i965/fs: Rewrite fs_visitor::split_virtual_grfs
The original vgrf splitting code was written with the assumption that vgrfs
came in two types: those that can be split into single registers and those
that can't be split at all It was very conservative and bailed as soon as
more than one element of a register was read or written.  This won't work
once we start allowing a regular MOV or ADD operation to operate on
multiple registers.  This rewrite allows for the case where a vgrf of size
5 may appropriately be split in to one register of size 1 and two registers
of size 2.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand f9da0740e2 i965/fs_live_variables: Use var_from_vgrf insead of repeating the calculation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-09-30 10:29:13 -07:00
Jason Ekstrand 75afe17b79 i965/fs: Manually generate the meta fast-clear shader
Previously, we were generating the fast-clear shader from GLSL.  The
problem is that fast clears require that we use a replicated write rather
than a regular write instruction.  In order to get this we had a
complicated and somewhat fragile optimization pass that looked for places
where we can use a replicated write and used it.  Since replicated writes
have a lot of restrictions, we only ever use them for fast-clear
operations.

This commit replaces the optimization pass with a function that just
generates the shader we want.  This is a) less code, b) less fragile than
the optimization pass, and c) generates a more efficient shader.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2014-09-30 10:29:13 -07:00
Michel Dänzer 61128d7507 radeonsi: Pass the slice size to si_dma_copy_buffer
Otherwise some parts of tiled slices can be missed.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-09-30 18:55:48 +09:00
Michel Dänzer 74aeccd701 radeonsi: Catch more cases that can't be handled by si_dma_copy_buffer/tile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-09-30 18:55:48 +09:00
Michel Dänzer d17b85524d radeonsi: Fix si_dma_copy(_tile) for compressed formats
Fixes GPUVM faults when running the piglit test "getteximage-formats
init-by-rendering" with R600_DEBUG=forcedma on SI.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-09-30 18:55:48 +09:00
Michel Dänzer 761d80ddab radeonsi: Fix tiling mode index for stencil resources
We are currently only dealing with depth-only or stencil-only resources
here, not with resources having both depth and stencil[0]. In both cases,
the tiling mode index is in the tile_mode field, not in the
stencil_tile_mode field.

[0] Add an assertion for that.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-09-30 18:55:48 +09:00
Chia-I Wu 594e1a2f4b ilo: fix format of edge flag pointer
The VE format of edge flag pointers was changed in
780ce576bb.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
2014-09-30 16:41:32 +08:00
Chia-I Wu 2d13b5ac81 ilo: add a pass to finalize ilo_ve_state
Add finalize_vertex_elements() to finalize ilo_ve_state.  This fixes a
potential issue with URB entry allocation for VS and move the complexity of
gen6_3DSTATE_VERTEX_ELEMENTS() to the new function.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
2014-09-30 16:41:32 +08:00
Chia-I Wu 2b4c8ffc30 ilo: precalculate aligned depth buffer size
To replace the hacky zs_align_surface().

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
2014-09-30 16:41:31 +08:00
Chia-I Wu 343b014b57 ilo: use dynamic bo for rectlist vertices
The size is always 24 bytes.  We can upload them to the dynamic buffer.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
2014-09-30 16:41:31 +08:00
Thomas Hellstrom 46537f1d03 st/xa: Fix regression in xa_yuv_planar_blit()
Commit "st/xa: scissor to help tilers" broke xa_yuv_planar_blit() and vmwgfx
textured video. Fix this by implementing scissors also in the yuv draw path.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
2014-09-30 08:31:33 +02:00
Kenneth Graunke 68627235f2 i965: Delete intel_chipset.h.
Unused; it was replaced by include/pci_ids/i965_pci_ids.h long ago.

Acked-by: Matt Turner <mattst88@gmail.com>
2014-09-29 20:10:00 -07:00
Alex Henrie 3bea907797 driconf: Correct and update Catalan translation
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-09-29 17:45:41 -07:00
Alex Henrie 33a7d0d040 driconf: Update Spanish translation
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-09-29 17:45:26 -07:00
Alex Henrie 3b34b876f4 driconf: Synchronize po files
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2014-09-29 17:45:10 -07:00
Eric Anholt 4ceaad14ff vc4: Don't try to do stores to buffers that aren't bound.
The code was kind of mixed up what buffers were getting stored in the case
that a resolve bit was unset (which are set based on the GL state at draw
time) and the buffer wasn't actually bound.  In particular, depth-only
rendering would store the color buffer contents, which happen to be
pointing at the depth buffer.

Thanks to clearing out the resolve bits for things we really can't
resolve, now I can drop the safety checks for buffer presence around the
actual stores.

Fixes 42 piglit tests.
2014-09-29 17:44:15 -07:00