instead of ignoring the argument and always dumping to
standard output.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using RMW on banked context registers is not safe. The value read
could be the wrong one. So if there has been a DRAW_IDX launched,
the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part
of RMW sees the correct value.
To avoid unnecessary WFI's, keep track if there is a need for WFI,
and only emit one if needed. Furthermore, keep track if we even
need to update the register in the first place.
And to cut down on the amount of RMW to avoid excessive WFI's, at the
tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the
state at beginning of draw/clear cmds (which we IB to) is always
undefined. In the draw/clear commands, we always still use RMW (with
WFI if needed), but only if the register value actually changes. (At
points where the current value cannot be known, the saved value is
reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there
never is chance for confusion.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Actually assign VSC_PIPE's properly, which will be needed for tiling.
And introduce fd_tile for per-tile state (including the assignment of
tile to VSC_PIPE). This gives us the proper pipe setup that we'll
need for hw binning pass, and also cleans things up a bit by not having
to pass so many parameters around. And will also make it easier to
introduce different tiling patterns (since we may no longer render
tiles in a simple left-to-right top-to-bottom pattern).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Found while tracking down memory leaks in VDPAU playback
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: "10.0" <mesa-stable@lists.freedesktop.org>
Prevents a potential memory leak found when tracking down something else.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: "10.0" <mesa-stable@lists.freedesktop.org>
Previously we were creating a new LLVMContext every time that we called
radeon_llvm_parse_bitcode, which caused us to leak the context every time
that we compiled a CL program.
Sadly, we can't dispose of the LLVMContext at the point that it was being
created because evergreen_launch_grid (and possibly the SI equivalent) was
assuming that the context used to compile the kernels was still available.
Now, we'll create a new LLVMContext when creating EG/SI compute state, store
it there, and pass it to all of the places that need it.
The LLVM Context gets destroyed when we delete the EG/SI compute state.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: "10.0" <mesa-stable@lists.freedesktop.org>
This fixes another case of faulting when freeing a pipe_sampler_view
that belongs to a previously destroyed context.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Liu <net147@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a crash where old_view->context was already freed in the
pipe_sampler_view_reference function contained in
src/gallium/auxiliary/utils/u_inlines.h. As a result, the
sampler_view_destroy function pointer contained 0xfeeefeee indicating
freed heap memory.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Liu <net147@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Similar to 556a47a262, without this reading from
gl_FragData[0] would cause a software fallback.
Bugzilla: https://bugs.winehq.org/show_bug.cgi?id=33964
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Cc: 10.0 9.2 9.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Note that ARB_occlusion_query was previously enabled twice.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Useful in its own right, but also needed for adaptive vsync.
No regressions in the piglit glx-oml-sync-control-getmscrate test.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
It is the maximum number of back buffers, but the name is confusing and is
easily read as the maximum back buffer index. Chage to DRI3_NUM_BACK to make
the intended usage a bit clearer.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just copying code from the dri2 path to set up the fast color clear state.
This also removes a couple of bogus intel_region_reference calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The buffer-object is the persistent thing passed through the loader, so when
updating an image buffer, check to see if it is already bound to the provided
bo. The region, on the other hand, is allocated separately for the miptree,
and so will never be the same as that passed back from the loader.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Move the depth field up with width and height.
Remove unused previous_time and frames fields.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
libxshmfence v1.0 foolishly used 'int32_t *' for the fence type, which
works when the fence is a linux futex. However, version 1.1
changes the exported datatype to 'struct xshmfence *'
Require libxshmfence version 1.1 and switch the API around.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While looking through the documentation, I found this in the Sandybridge
PRM (Volume 4, Part 1, Page 140):
"Use of sample_c with SURFTYPE_CUBE surfaces is undefined with the
following surface formats: I24X8_UNORM, L24X8_UNORM, A24X8_UNORM,
I32_FLOAT, L32_FLOAT, A32_FLOAT."
I haven't observed this to be true, but it suggests that we may want to
use other formats.
We already perform DEPTH_TEXTURE_MODE swizzling in the shaders, and
don't rely on the surface format to splat things appropriately. So
using RED should work just as well as INTENSITY.
A few notes about the formats:
- R24_UNORM_X8_TYPELESS has the exact same properties as I24X8_UNORM.
- R16_UNORM and R32_FLOAT are additionally supported as a render target,
while the old I16_UNORM/I32_FLOAT formats are not.
- R32_FLOAT_X8X24_TYPELESS is not supported as a render target, while
the old format (R32G32_FLOAT) was. However, it shares the same
properties as the formats we use for Z24, so it should suffice.
This makes translate_tex_format and brw_blorp_surface_info::set
a bit more similar.
No Piglit changes on Sandybridge or Ivybridge. No oglconform changes on
Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Emitting flushes before depth and hiz resolves at the top of blorp's
state emission fixes the hang. Marchesin and I found the fix
experimentally, as opposed to adhering to a documented hardware
workaround. A more minimal fix likely exists, but this gets the job
done.
Fixes HiZ hangs in the new WebGL Google maps on Sandybridge Chrome OS.
Tested by zooming in and out continuously for 2 hours.
This patch is based on
8bc07bb701
CC: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70740
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Broadwell allows us to specify an arbitrary value for QPitch, rather
than baking a specific formula into the hardware and requiring software
to lay things out to match. The only restriction is that the software
provided QPitch needs to be large enough so successive array slices do
not overlap.
In order to support this flexibility, software needs to specify QPitch
in a bunch of packets. Storing QPitch makes that easy, and allows us to
adjust it in a single place should we wish to change it in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Broadwell introduces support for Q, UQ, and HF types. It also extends
DF support to allow immediate values.
Irritatingly, although HF and DF both support immediates, they're
represented by a different value depending on the register file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge, Baytrail, and Haswell support double float register types,
but do not support them as immediate values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
On released hardware, values 4-6 are overloaded. For normal registers,
they mean UB/B/DF. But for immediates, they mean UV/VF/V.
Previously, we just created #defines for each name, reusing the same
value. This meant we could directly splat the brw_reg::type field into
the assembly encoding, which was fairly nice, and worked well.
Unfortunately, Broadwell makes this infeasible: the HF and DF types are
represented as different numeric values depending on whether the
source register is an immediate or not.
To preserve sanity, I decided to simply convert BRW_REGISTER_TYPE_* to
an abstract enum that has a unique value for each register type, and
write translation functions. One nice benefit is that we can add
assertions about register files and generations.
I've chosen not to convert brw_reg::type to the enum, since converting
it caused a lot of trouble due to C++ enum rules (even though it's
defined in an extern "C" block...).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Three-source instructions use a different encoding for register types
(and have a much more limited set to choose from).
Previously, we translated those into BRW_REGISTER_TYPE_* values, then
reused the existing reg_encoding mapping.
Doing it directly is more straightforward and actually less code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
UB types have never been supported as immediates. On Gen4-5, register
encoding 4 is "Reserved." On Gen6+, it means UV.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Sandybridge added support for packed unsigned vectors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When adding geometry shader support, we accidentally reversed the size
and offset parameters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Calling the local variables flat_enable and point_sprite_enable is
clearer than dw16 and such. It also matches the names used in
calculate_attr_overrides, which computes them.
v2: Add /* dw16 */ and /* dw10 */ comments, requested by Jordan.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
calculate_attr_overrides is responsible for computing the point sprite
and flat-shading enable bitfields. It does so by OR'ing in a bunch of
bits. However, it relied on the caller to set the initial value to
zero. This is pretty fragile - if the caller neglects to zero out those
variables, then the enable bitfields end up full of garbage, which shows
up as random things being flat-shaded.
This patch moves the zero-initialization into calculate_attr_overrides,
so that the computation is completely in one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git blame ascribes this to the initial commit of the driver.
No released hardware has ever supported half float, according to the
documentation for SrcType in the ISA reference.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If no function signature is found for a function name, report that the
function is not found instead of printing an empty list of candidates.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch changes the error reporting behavior for incorrect function
invocation (triggered by match_function_by_name() unable to find a
matching function call) from using the line number information
associated to the function name term to using the line number
information of the entire function expression. Fixes bug #72264.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72264
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Stop searching for a driver after success.
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
Reviewed-By: Gong, Zhigang <zhigang.gong@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For GLSL 1.50 we can get frag shaders with primitive id as an
input, add support to the translator for this.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In load_texunit_bumpmap tc_array is asserted so lets assert
rot_mat_0 and rot_mat_1 also which are coming from same path.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Check for malloc() returning null to fix Klocwork warnings.
Minor clean-ups by BrianP.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Change save_attrib_data() to return true/false depending on success.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RGB32F, GL_RGB32UI and GL_RGB32I texture buffer formats are
only supposed to be allowed if the GL_ARB_texture_buffer_object_rgb32
extension is supported. Note that the texture buffer extensions
require a core profile. This patch adds those checks.
Fixes the soon-to-be-added
arb_clear_buffer_object-negative-bad-internalformat piglit test.
Add function to test if the buffer is already mapped and if so,
if the mapped range overlaps the given range.
Modify the _mesa_InvalidateBufferSubData function to use
the new function.
Enable buffer_object_subdata_range_good() to use bufferobj_range_mapped
Reviewed-by: Brian Paul <brianp@vmware.com>
alpha, lumincance and intensity formats are illegal in a core context.
Add a check to return MESA_FORMAT_NONE if one of those is requested within
a core context.
Reviewed-by: Brian Paul <brianp@vmware.com>
- change storage class from static to extern
- rename validate_texbuffer_format to _mesa_validate_texbuffer_format
Reviewed-by: Brian Paul <brianp@vmware.com>
- add xml file for extension
- add reference in gl_API.xml
- add pointer to device driver function table (dd.h)
- update dispatch_sanity.cpp
Reviewed-by: Brian Paul <brianp@vmware.com>
Specs say it's legal for implementations to use internal copies, and
the write synchronization seems to work. Fixes clCreateBuffer
(together with previous patches) and buffer-flags piglits.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Francisco Jerez <currojerez@riseup.net>
v2: Use fewer if statements and functional tricks instead of single-use method,
suggested by Francisco Jerez.
Squash two small patches into one.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The dri2 state tracker is checking for driver support before enabling
dri2ImageExtension version 7. This commit adds a check that also the
kernel driver supports fd sharing through prime.
Note that this adds a libdrm dependency on dri2.c.
v2: Removed unnecessary clamping of bool expression
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
This will avoid compiler warnings in the patch that follows. There
should be no user-visible effect because the change only affects the
behaviour when an invalid enum is passed to
_mesa_shader_type_to_index(), and that can only happen if there is a
bug elsewhere in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
All this cruft was ported from r600g and isn't needed on SI and later
according to hw docs. If we implemented HiS, we would set it to 0.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Replicate some of the gallium pipe transfer functionality.
Also bump minor to signal availability of this feature.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
This fixes MSAA resolving for 32-bit integer colorbuffers, which isn't
implemented by the hardware.
It also fixes VM protection faults when resolving MSAA 2D array textures.
This may be a CB bug, because shader-based resolving works fine.
It may also be faster for upside-down and scaled blits.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For scaled resolve. The filter is only good for magnification.
If somebody has an idea how to implement a good filter for minification,
I'm all ears. I'd have to use derivatives probably.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need this for integer formats and upside-down blits, which Radeons don't
support for MSAA resolving.
It can be used by calling util_blitter_blit.
Reviewed-by: Brian Paul <brianp@vmware.com>
The argument is a i8 pointer not a i32 pointer (even though the value actually
stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do
leading to crashes.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Didn't really work as well as hoped (in particular it was not generally
more accurate), will solve this differently.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This code was always problematic, and with 64bit rasterization we no longer
need it at all.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The target parameter to _mesa_get_tex_image() is a target enum, not an index.
When we're setting up faces for a cubemap, it should be
CUBE_MAP_POSITIVE_X .. CUBE_MAP_NEGATIVE_Z; for all other targets it
should be the same as the texobj's target.
Fixes broken cubemaps [had only +X face but claimed to have all] produced by
glTextureView, which then caused various crashes in the driver when we
tried to use them.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: - add assert so we don't run into trouble on Gen6.
- adjust for Tapani's rearrangement of ir_variable
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GL_ARB_shadow spec says the shadow compare mode should have no
effect when sampling a color texture. As it was, it was up to
drivers to check for that (softpipe, llvmpipe, svga and probably
the rest don't do that). Note: it looks like DX10 allows shadow
compare with some non-depth formats, so this case really should be
handled in the state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Depending on the depth texture format, we may or may not have to
emit explicit fs code to do the shadow comparison. Before, we
were emitting it more often than needed.
v2: check the actual texture format rather than the screen->depth.z16
field. The screen->depth.z16, x8z24, s8z24 fields may not all be set
to a consistent set of depth formats.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Call TextureView helper function to set TextureView state
appropriately for the TexStorage calls.
Misc updates from review feedback.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add helper function to set texture_view state from TexStorage calls.
Include review feedback.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add Mesa TextureView logic.
Incorporate feedback on ARB_texture_view:
- Add S3TC VIEW_CLASSes to compatibility table
- Use existing _mesa_get_tex_image
- Clean up error strings
- Use bool instead of GLboolean for internal functions
- Split compound level & layer test into individual tests
- eliminate helper macro for VIEW_CLASS table
- do not call driver if ptr null.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Refactor to make next_mipmap_level_size defined in mipmap.c a
_mesa_ helper function that can then be used by texture_view
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add support for ARB_texture_view get parameters:
GL_TEXTURE_VIEW_MIN_LEVEL
GL_TEXTURE_VIEW_NUM_LEVELS
GL_TEXTURE_VIEW_MIN_LAYER
GL_TEXTURE_VIEW_NUM_LAYERS
Incorporate feedback regarding when to allow query of
GL_TEXTURE_IMMUTABLE_LEVELS.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add state needed by glTextureView to the gl_texture_object.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Stub in glTextureView API call to go with the
glTextureView API xml definition.
Includes dispatch test for glTextureView
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch changes the error condition to satisfy below statement
from OpenGL 4.3 core specification:
"An INVALID_OPERATION error is generated if id is the name of a query
object with a target other SAMPLES_PASSED, ANY_SAMPLES_PASSED, or
ANY_SAMPLES_PASSED_CONSERVATIVE, or if id is the name of a query
currently in progress."
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The driverPrivate pointer is opaque to the driver and we can't assume
it's a struct gl_context in dri_util.c. Instead provide a helper function
to set the struct gl_context flags from the incoming DRI context flags.
v2 (idr): Modify the other classic drivers to also use
driContextSetFlags. I ran all the piglit GLX_ARB_create_context tests
with i965 and classic swrast without regressions.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> [v1 on Gallium nouveau]
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patches add MESA_copy_sub_buffer support to the dri sw loader and
then to gallium state tracker, llvmpipe, softpipe and other bits.
It reuses the dri1 driver extension interface, and it updates the swrast
loader interface for a new putimage which can take a stride.
I've tested this with gnome-shell with a cogl hacked to reenable sub copies
for llvmpipe and the one piglit test.
I could probably split this patch up as well.
v2: pass a pipe_box, to reduce the entrypoints, as per Jose's review,
add to p_screen doc comments.
v3: finish off winsys interfaces, add swrast classic support as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
swrast: add support for copy_sub_buffer
Required for glClearBuffer, which only clears one colorbuffer attachment.
Example:
If the first colorbuffer is float and the second one is int:
pipe->clear(pipe, PIPE_CLEAR_COLOR0, float_clear_color, ...);
pipe->clear(pipe, PIPE_CLEAR_COLOR1, int_clear_color, ...);
This doesn't need any driver changes yet, because all drivers just use:
if (flags & PIPE_CLEAR_COLOR) ..
The drivers which support GL 3.0 will have to implement it properly though.
This corresponding piglit tests supported this incorrect behavior instead of
pointing at it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.0 9.2 9.1 <mesa-stable@lists.freedesktop.org>
This adds 2 optimizations for radeonsi:
- handling of DISCARD_RANGE
- mapping an uninitialized buffer range is automatically UNSYNCHRONIZED
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
v2: Renamed r600_buffer.c to r600_buffer_common.c. The stupid build system
doesn't allow 2 files of the same name in different directories.
If we assume that all buffers allocated by the DDX are scanout, a new flag
that says "this is not scanout" has to be added to support the non-scanout
buffers and maintain backward compatibility.
This fixes bad rendering on Wayland.
The flag is defined as:
#define RADEON_TILING_R600_NO_SCANOUT RADEON_TILING_SWAP_16BIT
AFAIK, RADEON_TILING_SWAP_16BIT is not used on SI.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Patch copies the whole data structure at once instead of
assigning individual variables.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields and variables to the data
structure:
explicit_location, explicit_index, explicit_binding, has_initializer,
is_unmatched_generic_inout, location_frac, from_named_ifc_block_nonarray,
from_named_ifc_block_array, depth_layout, location, index, binding,
max_array_access, atomic
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields in to the data structure:
used, assigned, how_declared, mode, interpolation,
origin_upper_left, pixel_center_integer
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Data section helps serialization and cloning of a ir_variable. This
patch includes the helper bits used for read only ir_variables.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Newer virtual HW versions support smooth/stipple/wide lines.
Use that instead of 'draw' fallbacks when possible.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
With this patch llvmpipe will adhere to the ARB_depth_clamp enabled state when
clamping the fragment's zw value. To support this, the variant key now includes
the depth_clamp state. key->depth_clamp is derived from pipe_rasterizer_state's
(depth_clip == 0), thus depth clamp is only enabled when depth clip is disabled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
On evergreen we have to reserve 1 stack element in some additional cases
besides the ones mentioned in the docs, but stack size computation was
recently reimplemented exactly as described in the docs by the patch that
added workarounds for stack issues on EG/CM, resulting in regressions
with some apps (Serious Sam 3).
This patch fixes it by restoring previous behavior.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Andre Heider <a.heider@gmail.com>
Disabled by default, but it's very useful when needed.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Caching in the vbuf module meant that once a vertex has been
emitted it was cached, but it's possible for a vertex at the
same location to be emitted again, but this time with a different
front-face semantic. Caching was causing the first version of the
vertex to be emitted, which resulted in the renderer getting
incorrect front-face attributes. By reseting the vertex_id (which
is used for caching) we make sure that once a front-face info
has been injected the vertex will endup getting emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The fact that we flush denorms to zero breaks our half-float
conversion and blending. This patches enables denorms for
blending. It's a little tricky due to the llvm bug that makes
it incorrectly reorder the mxcsr intrinsics:
http://llvm.org/bugs/show_bug.cgi?id=6393
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
This needs a prime-aware vmwgfx kernel module to work properly.
(With additions by Christopher James Halse Rogers <raof@ubuntu.com>)
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
v2: Fix transliteration of lseek arguments
Ignore busy return from RADEON_GEM_BUSY ioctl; we're only after the domain
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
It's a map of GEM name->bo, so identify it as such
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
v2: Fix up queryImage return for ATTRIB_FD
Use driver_descriptor.configuration to determine whether the driver
supports DMA-BUF import/export.
v3: Really, truly, fix up queryImage return for ATTRIB_FD
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Otherwise the default is TYPE_SHARED, which will flink the bo. This seems
rather unnecessary for a simple stride query.
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
v2: Pick out the correct gl_context pointer
v3: Don't leak pipe_resources on error path
Set img->dri_format correctly
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
First off, nv50_program only has 16 in/out varyings. However reporting
16 makes 'm' become 68 in nv50_fp_linkage_validate with the
varying-packing-simple piglit test. (Subverting the assert makes it
compile but fail.) With this patch, varying-packing-simple passes.
See: https://bugs.freedesktop.org/show_bug.cgi?id=69155
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
To help the transition period when DRI loaders are being updated
to support the newer __driDriverExtensions_foo mechanism,
we populate __driDriverExtensions with the extensions returned
by __driDriverExtensions_foo during a library contructor
function.
We find the driver foo's name by using the dladdr function
which gives the path of the dynamic library's name that
was being loaded.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
The _EGLSurface struct which is embedded into dri2_egl_surface also contains a
swap interval member so the other member is redundant. Nothing was using it as
far as I can tell.
On Gen4+, OUT_RELOC_FENCED is equivalent to OUT_RELOC; libdrm silently
ignores the fenced flag:
/* We never use HW fences for rendering on 965+ */
if (bufmgr_gem->gen >= 4)
need_fence = false;
Thanks to Eric for noticing this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that loop_controls no longer creates normatively bound loops,
there is no need for ir_loop::normative_bound or the
lower_bounded_loops pass.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when loop_controls analyzed a loop and found that it had a
fixed bound (known at compile time), it would remove all of the loop
terminators and instead set the loop's normative_bound field to force
the loop to execute the correct number of times.
This made loop unrolling easy, but it had a serious disadvantage.
Since most GPU's don't have a native mechanism for executing a loop a
fixed number of times, in order to implement the normative bound, the
back-ends would have to synthesize a new loop induction variable. As
a result, many loops wound up having two induction variables instead
of one. This caused extra register pressure and unnecessary
instructions.
This patch modifies loop_controls so that it doesn't set the loop's
normative_bound anymore. Instead it leaves one of the terminators in
the loop (the limiting terminator), so the back-end doesn't have to go
to any extra work to ensure the loop terminates at the right time.
This complicates loop unrolling slightly: when deciding whether a loop
can be unrolled, we have to account for the presence of the limiting
terminator. And when we do unroll the loop, we have to remove the
limiting terminator first.
For an example of how this results in more efficient back end code,
consider the loop:
for (int i = 0; i < 100; i++) {
total += i;
}
Previous to this patch, on i965, this loop would compile down to this
(vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) if(8)
break(8)
endif(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
(notice that both g8 and g4 are loop induction variables; one is used
to terminate the loop, and the other is used to accumulate the total).
After this patch, the same loop compiles to:
mov(8) g4<1>.xD 0D
loop:
cmp.ge.f0(8) null g4<4;4,1>.xD 100D
(+f0) if(8)
break(8)
endif(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This value is now redundant with
loop_variable_state::limiting_terminator->iterations and
ir_loop::normative_bound.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The old logic of loop_unroll_visitor::visit_leave(ir_loop *) was:
heuristics to skip unrolling in various circumstances;
if (loop contains more than one jump)
return;
else if (loop contains one jump) {
if (the jump is an unconditional "break" at the end of the loop) {
remove the break and set iteration count to 1;
fall through to simple loop unrolling code;
} else {
for (each "if" statement in the loop body)
see if the jump is a "break" at the end of one of its forks;
if (the "break" wasn't found)
return;
splice the remainder of the loop into the other fork of the "if";
remove the "break";
complex loop unrolling code;
return;
}
}
simple loop unrolling code;
return;
These tasks have been moved to their own functions:
- splice the remainder of the loop into the other fork of the "if"
- simple loop unrolling code
- complex loop unrolling code
And the logic has been flattened to:
heuristics to skip unrolling in various circumstances;
if (loop contains more than one jump)
return;
if (loop contains no jumps) {
simple loop unroll;
return;
}
if (the jump is an unconditional "break" at the end of the loop) {
remove the break;
simple loop unroll with iteration count of 1;
return;
}
for (each "if" statement in the loop body) {
if (the jump is a "break" at the end of one of its forks) {
splice the remainder of the loop into the other fork of the "if";
remove the "break";
complex loop unroll;
return;
}
}
This will make it easier to modify the loop unrolling algorithm in a
future patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the sole responsibility of loop_analysis was to find all
the variables referenced in the loop that are either loop constant or
induction variables, and find all of the simple if statements that
might terminate the loop. The remainder of the analysis necessary to
determine how many times a loop executed was performed by
loop_controls.
This patch makes loop_analysis also responsible for determining the
number of iterations after which each loop terminator will terminate
the loop, and for figuring out which terminator will terminate the
loop first (I'm calling this the "limiting terminator").
This will allow loop unrolling to make use of information that was
previously only visible from loop_controls, namely the identity of the
limiting terminator.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patches to follow will introduce code into the loop_terminator
constructor. Allocating loop_terminator using new(mem_ctx) syntax
will ensure that the constructor runs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When loop_control_visitor::visit_leave(ir_loop *) is analyzing a loop
terminator that acts on a certain ir_variable, it doesn't need to walk
the list of induction variables to find the loop_variable entry
corresponding to the variable. It can just look it up in the
loop_variable_state hashtable and verify that the loop_variable entry
represents an induction variable.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These fields were part of some planned optimizations that never
materialized. Remove them for now to simplify things; if we ever get
round to adding the optimizations that would require them, we can
always re-introduce them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch replaces the ir_loop fields "from", "to", "increment",
"counter", and "cmp" with a single integer ("normative_bound") that
serves the same purpose.
I've used the name "normative_bound" to emphasize the fact that the
back-end is required to emit code to prevent the loop from running
more than normative_bound times. (By contrast, an "informative" bound
would be a bound that is informational only).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, all of the back-ends (ir_to_mesa, st_glsl_to_tgsi, and the
i965 fs and vec4 visitors) had nearly identical logic for handling
bounded loops. This replaces the duplicate logic with an equivalent
lowering pass that is used by all the back-ends.
Note: on i965, there is a slight increase in instruction count. For
example, a loop like this:
for (int i = 0; i < 100; i++) {
total += i;
}
would previously compile down to this (vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) break(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
After this patch, the "(+f0) break(8)" turns into:
(+f0) if(8)
break(8)
endif(8)
because the back-end isn't smart enough to recognize that "if
(condition) break;" can be done using a conditional break instruction.
However, it should be relatively easy for a future peephole
optimization to properly optimize this.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, loop analysis would set
this->conditional_or_nested_assignment based on the most recently
visited assignment to the variable. As a result, if a vaiable was
assigned to more than once in a loop, the flag might be set
incorrectly. For example, in a loop like this:
int x;
for (int i = 0; i < 3; i++) {
if (i == 0)
x = 10;
...
x = 20;
...
}
loop analysis would have incorrectly concluded that all assignments to
x were unconditional.
In practice this was a benign bug, because
conditional_or_nested_assignment is only used to disqualify variables
from being considered as loop induction variables or loop constant
variables, and having multiple assignments also disqualifies a
variable from being considered as either of those things.
Still, we should get the analysis correct to avoid future confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when visiting an ir_call, loop analysis would only mark
the innermost enclosing loop as containing a call. As a result, when
encountering a loop like this:
for (i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
foo();
}
}
it would incorrectly conclude that the outer loop ran three times.
(This is not certain; if foo() modifies i, then the outer loop might
run more or fewer times).
Fixes piglit test "vs-call-in-nested-loop.shader_test".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when visiting a variable dereference, loop analysis would
only consider its effect on the innermost enclosing loop. As a
result, when encountering a loop like this:
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
...
i = 2;
}
}
it would incorrectly conclude that the outer loop ran three times.
Fixes piglit test "vs-inner-loop-modifies-outer-loop-var.shader_test".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function is about to get more complex.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fast color clears of MSAA buffers work just like fast color clears
with non-MSAA buffers, except that the alignment and scaledown
requirements are different.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will make it easier to add fast color clear support to MSAA
buffers, since they have different alignment and scaling requirements.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we didn't do multisample blorp clears because we couldn't
figure out how to get them to work. The reason for this was because
we weren't setting the brw_blorp_params num_samples field consistently
with dst.num_samples. Now that those two fields have been collapsed
down into one, we can do multisample blorp clears.
However, we need to do a few other pieces of bookkeeping to make them
work correctly in all circumstances:
- Since blorp clears may now operate on multisampled window system
framebuffers, they need to call
intel_renderbuffer_set_needs_downsample() to ensure that a
downsample happens before buffer swap (or glReadPixels()).
- When clearing a layered multisample buffer attachment using UMS or
CMS layout, we need to advance layer by multiples of num_samples
(since each logical layer is associated with num_samples physical
layers).
Note: we still don't do multisample fast color clears; more work needs
to be done to enable those.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_blorp_params contained two fields for determining
sample count: num_samples (which determined the multisample
configuration of the rendering pipeline) and dst.num_samples (which
determined the multisample configuration of the render target
surface). This was redundant, since both fields had to be set to the
same value to avoid rendering errors.
This patch eliminates num_samples to avoid future confusion.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch renames the enum that's used to keep track of fast clear
state from "mcs_state" to "fast_clear_state", and it removes the enum
value INTEL_MCS_STATE_MSAA (which previously meant, "this is an MSAA
buffer, so we're not keeping track of fast clear state"). The only
real purpose that enum value was serving was to prevent us from trying
to do fast clear resolves on MSAA buffers, and it's just as easy to
prevent that by checking the buffer's msaa_layout.
This paves the way for implementing fast clears of MSAA buffers.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The hardware blitter doesn't understand multisampled layouts, so
there's no way this could possibly succeed.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The "layer" parameters used in blorp, and the
intel_renderbuffer::mt_layer field, represent a physical layer rather
than a logical layer. This is important for 2D multisample arrays on
Gen7+ because the UMS and CMS multisample layouts use N physical
layers to represent each logical layer, where N is the number of
samples.
Also add an assertion to blorp to help catch bugs if we fail to follow
these conventions.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Clarify the fact that we only optimize full buffer clears using fast
color clear, and why.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Based on comments by Benjamin Morris <bmorris@nvidia.com> in
http://lists.freedesktop.org/archives/nouveau/2013-December/015328.html
This adds setting of is_long_term, and updates a few field names we were
unclear about.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Create the ref_bo without any storage type flags set for now. The issue
probably arises from our use of the additional buffer space at the end
of the ref_bo. It should probably be split up in the future.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Martin Peres <martin.peres@labri.fr>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
With this patch, generate_fs_loop will clamp any fragment shader depth writes
to the viewport's min and max depth values. Viewport selection is determined
by the geometry shader output for the viewport array index. If no index is
specified, then the default viewport index is zero. Semantics for this path
can be found in draw_clamp_viewport_idx and lp_clamp_viewport_idx.
lp_jit_viewport was created to store viewport information visible to JIT code,
and is validated when the LP_NEW_VIEWPORT dirty flag is set.
lp_rast_shader_inputs is responsible for passing the viewport_index through
the rasterizer stage to fragment stage (via lp_jit_thread_data).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The Wayland EGL platform now respects the eglSwapInterval value. The value is
clamped to either 0 or 1 because it is difficult (and probably not useful) to
sync to more than 1 redraw.
The main change is that if the swap interval is 0 then Mesa won't install a
frame callback so that eglSwapBuffers can be executed as often as necessary.
Instead it will do a sync request after the swap buffers. It will block for
sync complete event in get_back_bo instead of the frame callback. The
compositor is likely to send a release event while processing the new buffer
attach and this makes sure we will receive that before deciding whether to
allocate a new buffer.
If there are no buffers available then instead of returning with an error,
get_back_bo will now poll the compositor by repeatedly sending sync requests
every 10ms. This is a last resort and in theory this shouldn't happen because
there should be no reason for the compositor to hold on to more than three
buffers. That means whenever we attach the fourth buffer we should always get
an immediate release event which should come in with the notification for the
first sync request that we are throttled to.
When the compositor is directly scanning out from the application's buffer it
may end up holding on to three buffers. These are the one that is is currently
scanning out from, one that has been given to DRM as the next buffer to flip
to, and one that has been attached and will be given to DRM as soon as the
previous flip completes. When we attach a fourth buffer to the compositor it
should replace that third buffer so we should get a release event immediately
after that. This patch therefore also changes the number of buffer slots to 4
so that we can accomodate that situation.
If DRM eventually gets a way to cancel a pending page flip then the compositors
can be changed to only need to hold on to two buffers and this value can be
put back to 3.
This also moves the vblank configuration defines from platform_x11.c to the
common egl_dri2.h header so they can be shared by both platforms.
Consider a typical game-style main loop which might be like this:
while (1) {
draw_something();
eglSwapBuffers();
}
In this case the game is relying on eglSwapBuffers to throttle to a sensible
frame rate. Previously this game would end up using three buffers even though
it should only need two. This is because Mesa decides whether to allocate a
new buffer in get_back_bo which would be before it has tried to read any
events from the compositor so it wouldn't have seen any buffer release events
yet.
This patch just moves the block for the frame callback to get_back_bo.
Typically the compositor will send a release event immediately after one of
the attaches so if we block for the frame callback here then we can be sure to
have completed at least one roundtrip and received that release event after
attaching the previous buffer before deciding whether to allocate a new one.
dri2_swap_buffers always calls get_back_bo so even if the client doesn't
render anything we will still be sure to block to the frame callback. The code
to create the new frame callback has been moved to after this call so that we
can be sure to have cleared the previous frame callback before requesting a
new one.
This patch fixes this MinGW build error.
CC glapi_gentable.lo
glapi_gentable.c:47:19: fatal error: dlfcn.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Drivers will need to look at this to decide if they need to do
per-sample fragment shader dispatch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
My understanding is that Broadwell retains the same SCS mechanism
that Haswell has, so even if the underlying issue with this format
is not fixed, the w/a will be applied in SCS rather than needing
shader code.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the pieces are in place, this should provide
a nice performance boost for apps using multisample textures.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to emit extra shader code in this case to sample the
MCS surface first; we can't just blindly do this all the time
since IVB will sometimes try to access the MCS surface even if
disabled.
V3: Use actual MSAA layout from the texture's mt, rather
then computing what would have been used based on the format.
This is simpler and less fragile - there's at least one case where
we might want to have a texture's MSAA layout change based on what
the app does (CMS SINT falling back to UMS if the app ever attempts
to render to it with a channel disabled.)
This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain
an implementation detail of the miptree code.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This gives us correct behavior for both renderbuffers (which previously
worked) and multisample textures (which would never get an MCS surface
allocated, even if CMS layout was selected)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously this was only done for render targets.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The bspec says:
"SW must program the sample mask value in this field so that it matches
with 3DSTATE_SAMPLE_MASK"
I haven't observed this to actually fix anything, but stumbled across it
while adding the rest of the support for CMS layout for multisample
textures.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Haswell needs a copy of the sample mask in 3DSTATE_PS; this makes that
convenient.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The intention is that things like
int;
will generate a warning. However, we were also accidentally emitting
the same warning for things like
struct Foo { int x; };
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68838
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Aras Pranckevicius <aras@unity3d.com>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
For some reason this was left out when the version was changed...
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Before this patch, the following code would not be optimized even though
the first two instructions were common to the then and else blocks:
(+f0) IF
MOV dst0 ...
MOV dst1 ...
MOV dst2 ...
ELSE
MOV dst0 ...
MOV dst1 ...
MOV dst3 ...
ENDIF
This commit extends the peephole to handle this case.
No shader-db changes.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>