There are 3 changes:
1) stride is specified for each buffer, not just one, so that drivers don't
have to derive it from the outputs
2) new per-output property dst_offset, which specifies the offset
into the buffer in dwords where the output should be stored,
so that drivers don't have to compute the offsets manually;
this will also be useful for gl_SkipComponents
from ARB_transform_feedback3
3) register_mask is removed, instead, there is start_component
and num_components; register_mask with non-consecutive 1s
doesn't make much sense (some hardware cannot do packing of components)
Christoph Bumiller: fixed nvc0.
v2: resolve merge conflicts in Draw and clean it up
Virtual address space put the userspace in charge of their GPU
address space. It's up to userspace to bind bo into the virtual
address space. Command stream can them be executed using the
IB_VM chunck.
This patch add support for this configuration. It doesn't remove
the 64K ib size limit thought this limit can be extanded up to
1M for IB_VM chunk.
v2: fix rendering
v3: fix rendering when using index buffer
v4: make vm conditional on kernel support add basic va management
v5: catch the case when we already have va for a bo
v6: agd5f: update on top of ioctl changes
v7: agd5f: further ioctl updates
v8: indentation cleanup + fix non cayman
v9: rebase against lastest mesa + improvement from Marek & Michel
v10: fix cut/paste bug
v11: don't rely on updated radeon_drm.h
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make the comments precise. Explain why each branch is needed and correct.
Document the potential pitfall in the true-branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When using Mesa with a GLES API, calling _mesa_FramebufferRenderbuffer
with GL_DRAW_FRAMEBUFFER will report a 'user error' because
get_framebuffer_target validates that this enum from the framebuffer
blit extension is only used on GL. To work around it this patch makes
it use the GL_FRAMEBUFFER enum instead in that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43418
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The gl_renderbuffer::Format field wasn't always set properly. This
didn't matter much in the past but with the recent swrast/renderbuffer
mapping changes, core Mesa will be directly touching OSMesa colorbuffers
so using the right MESA_FORMAT_x value is important.
Unfortunately, there aren't MESA_FORMATs for all the possible OSmesa
format/type combinations, such as GL_FLOAT / OSMESA_ARGB. If anyone
runs into these we can add new Mesa formats.
v2: add warnings for unsupported formats, fix ARGB_REV mix-up.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We always access pull constant buffers using the message types "OWord
Block Read" or "OWord Dual Block Read". According to the Sandy Bridge
PRM, Vol 4 Part 1, pages 214 and 218, when using these messages:
"the surface pitch is ignored, the surface is treated as a
1-dimensional surface. An element size (pitch) of 16 bytes is
used to determine the size of the buffer for out-of-bounds
checking if using the surface state model."
Previously we were setting the pitch for pull constant buffers to the
size of the whole constant buffer--this made no sense and would have
led to incorrect behavior if it were not for the fact that the pitch
is ignored.
For clarity, this patch sets the pitch for pull constant buffers to 16
bytes, consistent with the hardware's behavior.
v2: Clarify the meaning of the ignored values by writing them as (16 - 1).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9bdc44a528 (i965: Replace struct
with bit shifting for WM pull constant surfaces) accidentally
introduced off-by-one errors into the calculation of the surface
width, height, and depth. This patch restores the correct
computation.
The reason this wasn't noticed by Piglit tests is that the size of our
constant surfaces is always less than 2^20, therefore the off-by-one
error was causing the "depth" field of the surface to be set to all
1's. The hardware interpreted this as an extremely large surface, so
overflow checking was effectively disabled.
No Piglit regressions on Sandy Bridge.
NOTE: This is a candidate for the 7.11 and 8.0 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flat SHADE_MODEL still overrides any non-flat interpolation
qualifier, but pulling that state out of the rasterizer cso
isn't really worth the effort, is it ?
NOTE: This is a candidate for the 8.0 branch.
Needed to implement the Map/UnmapRenderbuffer() driver hooks.
This fixes glRead/Draw/CopyPixels, etc.
See https://bugs.freedesktop.org/show_bug.cgi?id=44723
Note: This is a candidate for the 8.0 branch.
Tested-by: Kevin Hobbs <hobbsk@ohiou.edu>
This fixes accum buffer operations. The accumulation buffer is the
only malloc-based renderbuffer for the intel drivers.
v2: apply x/y offset to returned pointer
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit EXT_framebuffer_multisample/negative-copypixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/negative-copyteximage.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample-negative-readpixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/renderbuffer-samples.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Previously, we were saying that everything from the starting tile to
region width+height was part of the limits of our depthbuffer, even if
the tile was near the bottom of the depthbuffer. This mean that our
range was not clipping to buffer buonds if the start tile was anything
but the start of the buffer.
In bebc91f0f3, this was changed to
saying that we're just rendering to a region of the size of the
renderbuffer. This is great -- we get a range that should actually
match what we want. However, the hardware's range checking occurs
after the X/Y offset addition, so we were clipping out rendering to
small depth mip levels when an X/Y offset was present. Just add
tile_x/y to the width in that case -- the WM won't produce negative
x/y values pre-offset, so we just need to get the left/bottom sides of
the region to cover our buffer.
Fixes the following Piglit regressions on gen7:
spec/ARB_depth_buffer_float/fbo-clear-formats
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 8.0 branch.
The array holds GLuint values so remove the float cast.
Note, however, that to compute the average of four GLuints we really
want to do (a+b+c+d)/4 but that could overflow. This change doesn't
address that for now.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In the first case, the newImage[] array contains GLuint values.
In the second case, the parameter type is GLuint, but the maxDepth
value is never used in this case (GL_FLOAT_32_UNSIGNED_INT_24_8_REV).
Pass ~OU just to be safe.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We include both imports.h and u_math.h in the state tracker. This
leads to multiple, conflicting definitions of ffs() with MSVC.
Use FFS_DEFINED to skip the ffs() in u_math.h.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
include mesa headers before gallium headers to avoid problem with
ffs() being defined in u_math.h and then again in imports.h
The next commit will add some #ifdefs to prevent multiple definitions
of ffs().
Call ffs() and ffsll() everywhere. Define our own ffs(), ffsll()
functions when the platform doesn't have them.
v2: remove #ifdef _WIN32, __IBMC__, __IBMCPP_ tests inside ffs()
implementation. The #else clause was recursive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Alexander von Gluck <kallisti5@unixzen.com>
Some hardware versions rely on it to render correctly.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Introduce vbo_get_minmax_indices() function to handle the min/max index
computation for nr_prims(>= 1). The old code just compute the first
prim's min/max index; this would results an error rendering if user
called functions like glMultiDrawElements(). This patch servers as
fixing this issue.
As when nr_prims = 1, we can pass 1 to paramter nr_prims, thus I made
vbo_get_minmax_index() static.
v2: per Roland's suggestion, put the indices address compuation into
vbo_get_minmax_index() instead.
Also do comination if possible to reduce map/unmap count
v3: per Brian's suggestion, use a pointer for start_prim to avoid
structure copy per loop.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glDrawBuffer(GL_FRONT_AND_BACK) results in to segmentation fault if
intel->is_front_buffer_rendering is not enabled with GL_FRONT_AND_BACK.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44153
Reported-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead, do the uniform setting and input / output mapping directly in
brw_link_shader. Hurray for not generating Mesa IR! However, once
the i965 driver stops calling _mesa_ir_link_shader, UsesClipDistance
and UsesKill are no longer set.
Ideally gen6_upload_vs_push_constants should use the
gl_shader_program, but I don't see a way to propagate the information
there. The other alternative, since this is the only usage, is to
move gl_vertex_program::UsesClipDistance to brw_vertex_program.
The compile (and precompile) stages use UsesKill to determine the
cache key for the shader. This is then used to determine whether or
not to compile the shader. Calculating this data during compilation
is too late.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This previously enabled some optimizations in the fragment shader
(interpolation, etc.) if some input components were always 0.0 or
1.0. However, this data was generated by analyzing Mesa IR. The
next patch in this series removes generation of Mesa IR for GLSL
paths. When we detect that case, just set the used mask to ~0 and
circumvent the optimizations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It used to be done in ir_to_mesa, and that was kind of a bad place.
I didn't change st_glsl_to_tgsi because there is some strange stuff
happening in the code that generates glDrawPixels shaders. It looked
like this would break horribly if I touched anything.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Track the calculated data in gl_shader_program instead of the
individual assembly shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than looking at the settings in individual assembly programs,
look at the settings in the top-level uniform values. The old code
was flawed because examining each shader stage in isolation could
allow inconsitent usage across stages (e.g., bind unit 0 to a
sampler2D in the vertex shader and sampler1DShadow in the fragment
shader).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the fixed-function fragment shader was tracked as a
gl_program. This means that it shows up in the driver as a Mesa IR
program instead of as a GLSL IR program. If a driver doesn't generate
Mesa IR from the GLSL IR, that program is empty. If the program is
empty there is either no rendering or a GPU hang.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Poking directly at the backing resources works only by luck. Core
Mesa code should only know about the gl_uniform_storage structure.
Soon other code that looks at samplers will use the gl_uniform_storage
structures instead of the data in the gl_program.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The gen7_urb atom depends on CACHE_NEW_VS_PROG and CACHE_NEW_GS_PROG,
causing gen7_upload_urb() to be called when switching to a new VS
program.
In addition to partitioning the URB space between the VS and GS,
gen7_upload_urb() also allocated space for VS and PS push constants.
Unfortunately, this meant that whenever CACHE_NEW_VS was flagged, we'd
reallocate the space for the PS push constants. According to the BSpec,
after sending 3DSTATE_PUSH_CONSTANT_ALLOC_PS, we must reprogram
3DSTATE_CONSTANT_PS prior to the next 3DPRIMITIVE.
Since our URB allocation for push constants is entirely static, it makes
sense to split it out into its own atom that only subscribes to
BRW_NEW_CONTEXT. This avoids reallocating the space and trashing
constants.
Fixes a rendering artifact in Extreme Tuxracer, where instead of a snow
trail, you'd get a bright red streak (affectionately known as the
"bloody penguin bug").
This also explains why adding VS-related dirty bits to gen7_ps_state
made the problem disappear: it made 3DSTATE_CONSTANT_PS be emitted after
every 3DSTATE_PUSH_CONSTANT_ALLOC_PS packet.
NOTE: This is a candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Both dri2_create_context_attribs and drisw_create_context_attribs call
dri2_convert_glx_attribs, expecting it to fill in *api on success.
However, when num_attribs == 0, it was returning true without setting
*api, causing the caller to use an uninitialized value.
Tested-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
svga_sampler_view contains a pointer to a pipe_resource (base class of
svga_texture) and svga_texture contains a pointer to an svga_sampler_view.
This circular dependency prevented the objects from ever being freed when
they pointed to each other. Make the svga_sampler_view::texture pointer
a "weak reference" (no reference counting) to break the dependency.
This is safe to do because the pipe_resource/texture always has a longer
lifespan than the sampler view so when svga_sampler_view stops referencing
the texture, the texture's refcount never hits zero.
Fixes a memory leak seen with google earth and other apps.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
We were naively emitting each component at a time, even if we were
emitting the same value to multiple channels. Improves on a codegen
regression from the old VS to the new VS on some unigine shaders
(because we emit constant vecs/matrices as immediates instead of
loading them as push constants, so we had over 4x the instructions for
using them).
shader-db results:
Total instructions: 58594 -> 58540
11/870 programs affected (1.3%)
765 -> 711 instructions in affected programs (7.1% reduction)
WGL_ARB_extensions_string states that wglGetExtensionsStringARB should
return NULL for invalid HDCs. And some applications rely on it.
Reviewed-By: "Keith Whitwell" <keithw@vmware.com>
It caused an X protocol error in some (rare) situations.
This is a follow-on to the previous commits which fixes a bug reported
by Wayne E. Robertz.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is the same fix as the previous commit, except it's for the gallium
glx/xlib state tracker.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
as we do in Fake_glXChooseVisual(). This registers the MesaGLX
extension on the display so we can clean up buffers, etc. when
the display connection is closed.
Fixes a bug reported by Wayne E. Robertz.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Prevent any state from carrying over to a new translation in cases
where we assume that data is still zero from initial calloc (these
would require us to do individual zeroing before translation which
would be more code).
When creating an EGLImage from a struct wl_buffer * this ensures
that we create an XRGB8888 image if the wayland buffer doesn't have an
alpha channel. To determine if a wl_buffer has a valid alpha channel
this patch adds an internal wayland_drm_buffer_has_alpha() function.
It's important to get the internal format for an EGLImage right so that
if a GL texture is later created from the image then the GL driver will
know if it should sample the alpha from the texture or flatten it to
a constant of 1.0.
This avoids needing fragment program workarounds in wayland compositors
to manually ignore the alpha component of textures created from wayland
buffers.
krh: Edited to use wl_buffer_get_format() instead of wl_buffer_has_alpha().
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
No functional change. In the function
__indirect_glAreTexturesResident(), the variable cmdlen is only used
if USE_XCB is not defined. This patch avoids a compile warning in the
event that USE_XCB is defined.
v2: just move cmdlen declaration inside the #else part.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previous to this patch, we didn't do the limit check for
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS until the end of the
store_tfeedback_info() function, *after* storing all of the transform
feedback info in the gl_transform_feedback_info::Outputs array. This
meant that the limit check wouldn't prevent us from overflowing the
array and corrupting memory.
This patch moves the limit check to the top of tfeedback_decl::store()
so that there is no risk of overflowing the array. It also adds
assertions to verify that the checks for
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS and
MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS are sufficient to avoid
array overflow.
Note: strictly speaking this patch isn't necessary, since the maximum
possible number of varyings is MAX_VARYING (16), whereas the size of
the Outputs array is MAX_PROGRAM_OUTPUTS (64), so it's impossible to
have enough varyings to overflow the array. However it seems prudent
to do the limit check before the array access in case these limits
change in the future.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On drivers that set gl_shader_compiler_options::LowerClipDistance (for
example i965), we need to handle transform feedback of gl_ClipDistance
specially, to account for the fact that the hardware represents it as
an array of vec4's rather than an array of floats.
The previous way this was accounted for (translating the request for
gl_ClipDistance[n] to a request for a component of
gl_ClipDistanceMESA[n/4]) doesn't work when performing transform
feedback on the whole unsubscripted array, because we need to keep
track of the size of the gl_ClipDistance array prior to the lowering
pass. So I replaced it with a boolean is_clip_distance_mesa, which
switches on the special logic that is needed to handle the lowered
version of gl_ClipDistance.
Fixes Piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{1,2,3,5,6,7}]-no-subscript".
Reviewed-by: Eric Anholt <eric@anholt.net>
The function tfeedback_decl::num_components() was not correctly
accounting for transform feedback of whole arrays and gl_ClipDistance.
The bug was hard to notice in tests, because it only affected the
checks for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS and
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS.
This patch fixes the computation, and adds an assertion to verify
num_components() even when MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS
and MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS are not exceeded.
The assertion requires keeping track of components_so_far in
tfeedback_decl::store(); this will be useful in a future patch to fix
non-multiple-of-4-sized gl_ClipDistance.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just fixes up the enables for native integers and EXT_texture_integer
support in st/mesa.
It also set the MaxClipPlanes to 8.
We should consider exposing caps for MCP vs MCD, but since core
mesa doesn't care yet maybe we can wait for now.
v2: use 32-bit formats as per Marek's mail.
v3: add calim's fix for INT_DIV_TO_MUL_RCP disabling.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It doesn't look like the GLSL compiler will produce sign op
for an unsigned anyways (seems insane anyways).
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds integer version of SSG that GLSL 1.30 can produce.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This enables fragment clamping in softpipe, it passes more
tests than it did previously with no regressions, There are still
a couple of failures in the SNORM types to investigate.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a number of texelFetch swizzle tests, and consoldiates
the swizzle handling in a new function.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for using the clipdistance instead of clip plane.
Passes all piglit clipdistance tests.
v2: fixup some comments from Brian in review.
Signed-off-by: Dave Airlie <airlied@redhat.com>
softpipe always clipped using the position vector, however for unclipped
vertices it stored the position in window coordinates, however when position
and clipping are separated, we need to store the clip-space position and
the clip-space vertex clip, so we can interpolate both separately.
This means we have to take the clip space position and store it to use later.
This allows softpipe to pass all the clip-vertex piglit tests.
v2: fix llvm draw regression, the structure being passed into llvm needed
updating, remove some hardcoded ints that should have been enums while there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This required changing the system value semantics, so we stored
a system value per vertex, instance id is the only other system
value we currently support, so I span it across the channels.
This passes the 3 vertexid-* piglit tests + lots of instanceid tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If draw isn't using llvm we can support vertex texture and integers,
These will be fixed up later, but for now allow this check to happen
at run-time.
v2: since 3e22c7a253 we can ask draw for a non-llvm
context. Just track if ask and set the vars accordingly. This probably isn't perfect but should cover the cases we care about.
v3: use debug option, restructure to store in screen, as suggested by Jakob.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Mesa shouldn't call into the drivers if there are no renderbuffers
bound to the attachments for the buffers to be cleared.
Fixes a number of the clearbuffer-* tests on softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes the test to allow cube/depth combinations on GL3
or EXT_gpu_shader4.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We're not quite ready to actually support it in the implementation,
but at least this allows GL 3.0 API-reliant applications to hopefully
run successfully, though they won't get multisampling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The EXT_texture_array required only 64, but GL 3.0 required 256.
Since we're already exposing values that can get us way beyond our
ability to map the single object directly, go ahead and expose all the
way to hardware limits.
Tested with new piglit EXT_texture_array/maxlayers on gen7.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were doing the kill of the updated channels, then adding our copy
to the list of available stuff to copy. But if the copy was updating
its own source channels, we didn't notice, breaking this code:
R0.xyzw = arg0 + arg1;
R0.xyzw = R0.wwwx;
gl_FragColor.xyzw = clamp(R0.xyzw, 0.0, 1.0);
Fixes piglit glsl-copy-propagation-self-2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies all batches needed for HiZ. The batch length for
3DSTATE_HIER_DEPTH_BUFFER is also corrected from 4 to 3.
Performance +6.7% on Citybench.
num-frames: 400
resolution: 1918x1031
avg-hiz-off: 127.90 fps
avg-hiz-on: 136.50 fps
kernel: git://people.freedesktop.org/~anholt/linux.git branch=gen7-reset-sol sha=23360e4
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It is unwise to use a stencil region's size to determine its
renderbuffer's size, because at region creation we fudge the width and
height to accomodate interleaved rows. (See the comment for MESA_FORMAT_S8
in intel_miptree_create()). Most users of stencil_region->{width,height}
should be converted to use stencil_rb->{Width,Height}.
We have already done the replacement in several locations. This patch
continues the replacement in {brw,gen7}_emit_depthbuffer(). To make those
functions look consistent, I've also done the equivalent replacement for
the depth buffer.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It was named GEN6_WM_DEPTH_RESOLVE. Luckily, this caused no conflict,
because the value is identical for gen6 and gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't create clip outputs if no clip planes are enabled.
Move clip validation after program validation: we were calling
linkage validation in case the VP needed rebuilding before the
FP was validated.
The vertex program needs to be built first because when
ClipDistance is used we'll want to only enable those outputs that
are also written.
These initialization functions weren't initializing all the fields so
some had undefined values. The callers of these functions sometimes use
a structure assignment to initialize new objects from these templates
so we'd just propagate the undefined values. That made for some confusing
info when debugging, plus it could lead to bugs.
v2: fix surf pointer mix-up: "&surf" -> "surf"
Jakob Bornecrantz <jakob@vmware.com>
This code isn't used anymore in preference for DRI2 client side swap buffers
throttling or throttling done inside the xa or xorg driver.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
This peice of code has been here since the inital commit (c5c5cd71) and the
code that used instance_id_index was removed in (caede752) by José.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
So the targets can drop the sw_wrapper winsys when no sw driver is being used.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
This replaces the current code with an implementation compatible with
the new gallium interface. I've left some of the remains of the interface
intact so llvmpipe keeps building correctly, and I'll take a look at fixing
llvmpipe up later.
v2: fixup as per Brian's review
Signed-off-by: Dave Airlie <airlied@redhat.com>
This introduces an unspecified interpolation paramter that is only allowed for
color semantics, so a specified GLSL interpolation will override the ShadeModel
specified interpolation, but not vice-versa.
This fixes a lot of the interpolation tests in piglit.
v2: rename from unspecified to color
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could lead to incorrect code when fixed regs are involved.
Surprisingly, the increased freedom actually leads to lower
register usage in some cases. Still want to find a better way
to treat constraints though ...
Always set position to insert before the current instruction,
the previous behaviour led to confusion (bug in checkPredicate
for BBs with only a single conditional branch).
Conflicts:
src/gallium/auxiliary/tgsi/tgsi_strings.c
src/mesa/state_tracker/st_atom_clip.c
commit d919791f2742e913173d6b335128e7d4c63c0840
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 17:59:22 2012 +0100
d3d1x: adapt to new clip state
commit cfec82bca3fefcdefafca3f4555285ec1d1ae421
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 14:16:51 2012 +0100
gallium/docs: update for clip state changes
commit c02bfeb81ad9f62041a2285ea6373bbbd602912a
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 14:21:43 2012 +0100
tgsi: add TGSI_PROPERTY_PROHIBIT_UCPS
commit d4e0a785a6a23ad2f6819fd72e236acb9750028d
Author: Brian Paul <brianp@vmware.com>
Date: Thu Jan 5 08:30:00 2012 -0700
tgsi: consolidate TGSI string arrays in new tgsi_strings.h
There was some duplication between the tgsi_dump.c and tgsi_text.c
files. Also use some static assertions to help catch errors when
adding new TGSI values.
v2: put strings in tgsi_strings.c file instead of the .h file.
Reviewed-by: Dave Airlie <airlied@redhat.com>
commit c28584ce0d8c62bd92c8f140729d344f88a0b3cd
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 12:48:09 2012 +0100
gallium: extend user_clip_plane_enable to apply to clip distances
commit f1d5016c07f786229ed057effbe55fbfd160b019
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Jan 6 02:39:09 2012 +0100
nvfx: adapt to new clip state
commit 6f6fa1c26bd19f797c1996731708e3569c9bfe24
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Jan 6 01:41:39 2012 +0100
st/mesa: fix DrawPixels with GL_DEPTH_CLAMP
commit c86ad730aa1c017788ae88a55f54071bf222be12
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Tue Jan 3 23:51:30 2012 +0100
nv50: adapt to new clip state
commit 3a8ae6ac243bae5970729dc4057fe02d992543dc
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Tue Jan 3 23:32:36 2012 +0100
nvc0: adapt to new clip state
commit 6243a8246997f8d2fcc69ab741a2c2dea080ff11
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Dec 29 01:32:51 2011 +0100
draw: initalize pt.user.planes in draw_init
This fixes a crash in glean/fpexceptions.
commit e3056524b19b56d473f4faff84ffa0eb41497408
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Dec 26 06:26:55 2011 +0100
svga: adapt to new clip state
commit c5bfa8b37d6d489271df457229081d6bbb51b4b7
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 14:11:51 2011 +0100
r600g: adapt to new clip state
commit f11890905362f62627c4a28a8255b76eb7de7df2
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 14:10:26 2011 +0100
r300g: adapt to new clip state
commit e37465327c79a01112f15f6278d9accc5bf3103f
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 12:39:16 2011 +0100
draw: adapt to new clip state
This adds a regression in the LLVM clipping path. Can anybody see anything
wrong with the code? It works for every other case, just glean/fpexceptions
crashes when doing the "Infinite clip plane test".
commit b474d2b18c72d965eefae4e427c269cba5ce6ba2
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 13:14:59 2011 +0100
u_blitter: don't save/set/restore clip state
commit 9dd240ea91f523a677af45e8d0adb9e661e28602
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 13:11:56 2011 +0100
gallium: don't cso_save/set/restore clip state
The enable bits are in the rasterizer state.
commit a4f7031179f5f4ad524b34b394214b984ac950f6
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 12:58:55 2011 +0100
gallium: default depth_clip to 1
depth_clip = !depth_clamp
commit fe21147a00ab90e549d63fe12ee4625c9c2ffcc3
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Dec 26 06:14:19 2011 +0100
trace,util: update state logging to new clip state
Also dump the other missing flags.
commit 2a3b96e84ac872dcc5bc1de049fe76bb58d64b23
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 10:43:43 2011 +0100
st/mesa: adapt to new clip state
commit b7b656a42fca19d7c85267f42649a206a85a2c72
Author: Marek Olšák <maraeo@gmail.com>
Date: Sat Dec 17 15:45:19 2011 +0100
gallium: move state enable bits from clip_state to rasterizer_state
This brings the code in sync with gen6_sf_state.c; presumably the
mistake was a botched rebase on initial Ivybridge bring-up patches.
Found by diffing batch buffer dumps and noticing the random values.
Thanks to Eric for catching the obvious mistake.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In f3e9ccb3b, I renamed gen6_upload_wm_constants to
gen6_upload_wm_push_constants, but neglected to update this comment.
I don't think there ever was a gen7_prepare_wm_constants function; it
was probably a search and replace error. Of course, "prepare" functions
died a while back as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
CACHE_NEW_SAMPLER doesn't cover max_wm_threads, but it does cover
brw->sampler.count. BRW_NEW_PS_BINDING_TABLE is obvious, but it's
probably worth adding a comment anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The dirty bit was already correctly in place, but there was no comment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also, annotate the use of _NEW_POINT as long as we're adding a comment.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to a comment in gen6_sf_state, calls to get_attr_override need
both _NEW_PROGRAM and _NEW_LIGHT. Since Gen7 reuses the same function,
the same dirty bits should apply.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BRW_NEW_CURBE_OFFSETS dirty bit is only flagged by the
brw_curbe_offsets state atom which is only used on Gen4-5.
Since it's never flagged, there's no reason to depend on it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BRW_NEW_URB_FENCE dirty bit is only flagged by the
brw_recalculate_urb_fence state atom which isn't used on Gen6+.
Since it's never flagged, there's no reason to depend on it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This brings the dirty bits in line with the comments.
This does /not/ need to be cherry-picked to stable branches because the
access requiring _NEW_BUFFERS was added in master as part of HiZ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The trick was to produce an assignment in the IR along the lines of:
(assign (xyzw) (var_ref R0) (swiz wwww (var_ref R0) ))
which occurs only rarely even in code that looks like it should do
this, because of the assignment temporaries generated in ast_to_hir.
From the IR above, this optimization pass would then propagate
references of R0 into R0.wwww (seems reasonable), but without this
patch, a later reference of R0.wwww would see R0 first, turning that
into R0.wwww.wwww, which triggered opt_swizzle_swizzle, and then we
looped back to this code to do it again. Avoid that by skipping over
the usual ir_rvalue visitor's ir_swizzle hook, so that we get
handle_rvalue() on the ir_swizzle itself, not its referenced value.
Looking at only the swizzle will always optimize away at least as much
as looking at the swizzle's refererenced value.
We now still claim to propagate r0.w into r0.w, but at least we don't
trigger the loop.
v2: Rewrite commit message (changes by anholt)
Fixes piglit glsl-copy-propagation-self-1
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=34006
The r300 driver requires LLVM when building and other drivers that
depend on it for all TNL, like i915g will be a lot slower without it.
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The optimization was supposed to turn an attribute component that was
always 1.0 into a mov of 1.0. But by leaving loop this patch removes
out of that test, we applied the projection correction to the 1.0 and
got some other value, breaking openarena once it was converted to
using the new compiler backend.
Originally this hunk was separate from the former loop to make the
generated instructions slightly better pipelined. We now have
automatic instruction scheduling to handle that, and the generated
instruction sequence looked the same to me after this change (except
for the bugfix).
Previous to this patch, if the client requested transform feedback
using a subscript, but the variable was not an array
(e.g. "gl_FrontColor[0]"), we would produce a bogus error message like
"Transform feedback varying gl_FrontColor[0] found, but it's an array
([] expected)".
Changed the error message to e.g. "Transfrorm feedback varying
gl_FrontColor[0] requested, but gl_FrontColor is not an array."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were only comparing the number of depth and stencil bits but the
extension spec actually says the formats must match:
The error INVALID_OPERATION is generated if BlitFramebufferEXT is
called and <mask> includes DEPTH_BUFFER_BIT or STENCIL_BUFFER_BIT
and the source and destination depth or stencil buffer formats do
not match.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Based on patches from Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Acked-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Commit acf82657f4 supposedly enabled
SIMD16 dispatch, but neglected to set the "16 Pixel Dispatch Enable"
bit, so nothing actually got enabled.
Furthermore, it neglected to set up the Dispatch GRF Start Register for
kernel 2, which is the SIMD16 program.
Increases performance in Nexuiz by ~15% at 800x600 (n=3).
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
$ dict invarient
No definitions found for "invarient", perhaps you mean:
gcide: Invariant
wn: invariant
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Replace target, level parameters with gl_texture_image.
Add gl_renderbuffer parameter to indicate source buffer for the copy.
This removes some redundant code in the drivers to find the source
renderbuffer and the destination texture image (which we already had
in _mesa_CopyTexSubImage).
Signed-off-by: Brian Paul <brianp@vmware.com>
We were comparing 32-bit Z buffer values against 16-bit fragment values.
Need to do scaling like for the 24-bit case.
Triangle Z testing was OK since it didn't hit this code path.
If the assertion was hit, it probably meant that we were unable to allocate
or map a vertex buffer. Instead of dying in a debug build, issue a warning
and continue.
We need to pass the pre-projection matrix clip planes into the driver,
instead of the post for the case we have a vertex shader that writes clip
vertex.
Signed-off-by: Dave Airlie <airlied@redhat.com>