For a blit-uploaded temporary, it's faster on current hardware to memcpy
the data into a linear CPU mapping than to go through the GTT.
v2: Turn the not-fully-supported mask into 3 supported enum values.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v2)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v2)
This is just in case someone else trips over this due to our weird reuse
of this code in glBlitFramebuffer().
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
If the hw is pre-gen5 and can't blit depth, it'll cleanly error out.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
I think we've measured no performance difference from this in the past,
except that the blorp code can do things like multisample resolves.
Prevents piglit regression in the next commit when a testcase started
trying to do a multisampled resolve through the old glCopyTexSubImage()
path.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
We were protected for a long time by the fact that depth was Y tiled and
you couldn't blit Y. Now that we can blit Y, we were failing to resolve
depth in glCopyPixels().
Note in the comment about swrast, that the swrast map path does resolves
appropriately already.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
I had previously asserted that it was hard to write a useful, simpler
blit function, but I think this might be it.
This has the side effect of extending the 32k pitch check to a few more
places that were missing it.
v2: Update comment for being moved inside intel_miptree_blit().
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This makes it more consistent with intel_miptree_get_tile_offsets().
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Right now, the callers in i965 don't expect a nonzero page offset to
actually occur (since that's being handled elsewhere), but it seems
like a trap to leave it this way.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
It appears that `sizeof(Class::member)` is either non-standard or
merely unsupported in MSVC.
So use `sizeof(instance->member)` instead, which is guaranteed to work
everywhere.
Also promote the assert to a static assert.
Trivial.
The problem is the sampler units are allocated from the same pool for all
shader stages, so if a vertex shader uses 12 samplers (0..11), the fragment
shader samplers start at index 12, leaving only 4 sampler units
for the fragment shader. The main cause is probably the fact that samplers
(texture unit -> sampler unit mapping, etc.) are tracked globally
for an entire program object.
This commit adapts the GLSL linker and core Mesa such that the sampler units
are assigned to sampler uniforms for each shader stage separately
(if a sampler uniform is used in all shader stages, it may occupy a different
sampler unit in each, and vice versa, an i-th sampler unit may refer to
a different sampler uniform in each shader stage), and the sampler-specific
variables are moved from gl_shader_program to gl_shader.
This doesn't require any driver changes, and it fixes piglit/max-samplers
for gallium and classic swrast. It also works with any number of shader
stages.
v2: - converted tabs to spaces
- added an assertion to _mesa_get_sampler_uniform_value
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
to match the size of ctx->Texture.Unit, and it will also fix
piglit/max-samplers with the following commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some Gallium drivers were crashing, because the array was not large enough.
v2: clamp the per-shader maximum in st/mesa, then sum them all up
NOTE: This is a candidate for the stable branches.
Set up CB_SHADER_MASK register according to pixel shader exports, and enable
some minimal state for colour buffer 1 in case dual source blending is used.
TGSI_TEXTURE_BUFFER is one-dimensional. Assert that exec_tex() is never
called with TGSI_TEXTURE_BUFFER.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch improves handling of unconditional KILL instructions inside
the conditional blocks, uncovering more opportunities for if-conversion.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
PRED_SET instructions that update exec mask should be scheduled immediately
prior to the "if-then-else" block, because any instruction that is
inserted after alu clause with PRED_SET and before conditional block is
also conditionally executed by hw (exec mask is already updated at that
moment).
Propbably it's better to make PRED_SET a part of conditional
"if-then-else" block in the IR to handle this more cleanly,
but for now this temporary solution should prevent the problem.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
For compute shaders we need to let the backend know that
GPRs 0 and 1 are preloaded with some compute-specific input
values, otherwise any use of these regs without previous
definition is considered as undefined value and usually
is simply replaced with 0.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
This allows GVN rewrite pass to propagate non-const (register)
values to FETCH source operands, helping to eliminate unnecessary
copies in some cases.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
We have to assume that all GPRs in compute shader can be indirectly
addressed because LLVM backend doesn't provide any indirect array info.
That's why for compute shaders GPR array is created that covers all used
GPRs (0..r600_bytecode::ngpr-1), but this seriously restricts register
allocation in sb.
This patch checks for actual use of indirect access in the shader and
if it's not used then GPR array is not created, so that regalloc is not
unnecessarily restricted.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
It turns out the MI_LOAD_REGISTER_IMM approach doesn't work on Haswell,
and regressed essentially all the transform feedback Piglit tests.
This morally reverts eaa6fbe6d5. However,
the code is still simpler than it was. On BeginTransformFeedback, we
simply flush the batch and set the SOL reset flag so that the next batch
will start with zeroed offsets. There's still no software counting.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64887
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>