Even though luminance formats don't have alpha, we still want the alpha
output to go to the blender. This fixes the luminance blending tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(originally part of previous patch, split out to separate patch by Rob)
v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This avoids exceeding the size of the .index bitfield since it got
truncated, and should make our NIR look more like the NIR that the rest of
the NIR developers are working on.
v2: split out vc4 updates, first patch uses varying_slot_to_tgsi_semantic()
helper, and second patch does the actual conversion.
v3: add frag_result_to_tgsi_semantic() helper and don't try to map
frag_results to semantic name/index as if they were varying_slot's
v4: use VERT_ATTRIB_ for VS inputs
v5: Fix vc4 build.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is what the hardware supports, there never was any sort of 64K
limit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
This fixes failures with the newly-submitted max-size texture buffer
piglit test for GPUs exposing >= 128M max texels.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
v2: split out moving of FILE *fp into state structure into it's own
(more complete patch) to reduce the noise in this one
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Rename print_var_state to print_state, and stuff FILE ptr into the state
object. This avoids passing around an extra parameter everywhere.
v2: even more extensive conversion.. use state *everywhere* instead of
FILE ptr, and convert nir_print_instr() to use state as well
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Similar to fee0686c21, but in this case to
ensure that drm_gralloc and libGLES_mesa are sharing a single screen.
Bumps libdrm_freedreno version dependency, as it requires the new
fd_device_fd() API.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When preparing the barrier payload, the instructions should operate in
simd8 mode since we only use 1 payload register.
fs_inst::regs_read is also updated to indicate that it only reads one
register for SHADER_OPCODE_BARRIER.
These issues were flagged by:
commit cadd7dd384
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Thu Jul 2 15:41:02 2015 -0700
i965/fs: Add a very basic validation pass
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
C++ gets cranky if we take references of temporaries. This isn't a problem
yet in master because nir_builder is never used from C++. However, it will
be in the future so we should fix it now.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Both use the same layout for the buffer containing border-color values,
so rather than duplicating the logic in a4xx, split it out into a
helper.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we have a replicating fdot instruction, we can actually coalesce
into the destinations of vec4 instructions. We couldn't really do this
before because, if the destination had to end up in .z, we couldn't
reswizzle the instruction. With a replicated destination, the result ends
up in all channels so we can just set the writemask and we're done.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1747753 -> 1746280 (-0.08%)
instructions in affected programs: 143274 -> 141801 (-1.03%)
helped: 667
HURT: 0
It turns out that dot-products matter...
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The old pass blindly inserted a bunch of moves into the shader with no
concern for whether or not it was really needed. This adds code to try and
coalesce into the destination of the instruction providing the value.
Shader-db results for vec4 shaders on Haswell:
total instructions in shared programs: 1754420 -> 1747753 (-0.38%)
instructions in affected programs: 231230 -> 224563 (-2.88%)
helped: 1017
HURT: 2
This approach is heavily based on a different patch by Eduardo Lima Mitev
<elima@igalia.com>. Eduardo's patch did this in a separate pass as opposed
to integrating it into nir_lower_vec_to_movs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Previously, we did this thing with keeping track of a separate start_idx
which was different from the iteration variable. I think this was a relic
of the way that GLSL IR implements writemasks. In NIR, if a given bit in
the writemask is unset then that channel is just "unused", not missing. In
particular, a vec4 operation with a writemask of 0xd will use sources 0, 2,
and 3 and leave source 1 alone. We can simplify things a good deal (and
make them correct) by removing this "compacting" step.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier. In particular, supporting SSA values takes
very little work in the backend and allows us to take advantage of the
majority of the SSA information even after we've gotten rid of Phi nodes.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
v2 (Jason Ekstrand):
- Use nir_instr_rewrite_dest
- Pass the impl directly into lower_vec_to_movs_block
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Previously, we were passing the shader around, we were just calling it
"mem_ctx". However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's. We can add more as
we find it to be useful.
Reviewed-by: Matt Turner <mattst88@gmail.com>
In certain conditions, we have to do bounds-checking in the shader for
image_load_store. The way this works for image loads is that we do a
predicated load and then emit a series of selects, one per component,
that gives us 0 or the loaded value depending on whether or not you're
in bounds. However, we were hard-coding 4 components which may not be
correct. Instead, we should be using size which is the number of
components read.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
According to the extensions table and our glext headers,
OES_compressed_ETC1_RGB8_texture is only supported in
GLES1 and GLES2. Since we may give users a GLES3 context
when a GLES2 context is requested, we also allow this
extension for GLES3 as well.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Driver vendors do this as well. The extension specification
lists GLES 1.1 or 2.0 as requirements.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
It's extensively used by XA for a8- and planar yuv component surfaces.
This fixes broken XA yuv blits using vgpu10 contexts.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently the check was incorrect as it did not consider the (unlikely)
case of fd == 0. In order to fix this we should first correctly
initialize it to -1, as the swrast implementations leave it set to zero
(props to calloc()).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Move the fcntl(dupfd_cloexec) to the else branch where it belongs.
Otherwise it's not immediately obvious that the code is hit, only when
an existing device is used.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
v2: [Emil Velikov]
Rework the error path to a common goto, close only if we own the fd.
v3; [Emil Velikov]
Always close the fd (we either opened the device or dup'd) (Boyan, Ian)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
At the moment if a gbm buffer is imported and the gbm buffer
has an old-style GBM_BO_FORMAT format, the import will crash,
since it's passed directly to DRI functions that expect
a fourcc format (as provided by the newer GBM_FORMAT
definitions)
This commit addresses the problem in two ways:
1) it prevents invalid formats from leading to a crash by
returning EINVAL if the image couldn't be created
2) it translates GBM_BO_FORMAT formats into the comparable
GBM_FORMAT formats.
Reference: https://bugzilla.gnome.org/show_bug.cgi?id=753531
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
All other precompile functions live in the brw_<stage>.c files, make fs
follow the convention.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
This moves the compute shader code around in order to make the way the
code is split up more consistent. There should be no functional changes.
Typically we have a few files per stage:
brw_vs.c, brw_wm.c brw_gs.c:
code to drive code generation and implement precompiling and
cache search.
genX_<stage>_state.c
gen specific implementation of the state emission for the shader
stage.
The brw_*_emit() functions are all in the same files as the visitor
classes they use (with the exception of VS, which may use either vec4 or
fs).
To make compute follow this convention, we move the brw_cs_emit()
function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and
do this in C like the other similar files. Finally, move state setup
and atoms to gen7_cs_state.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
See similar fix for Readpixels in mesa commit 0d20790. Jason suggested
we need that for TexSubImage as well.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Curiously this has no actual effect. I think it's because the first 8
textures are bound in multiple slots for some reason. However seems
prudent to use these the same way as regular texturing, esp in the case
where there are more than 8 textures bound.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>