At least i965 hardware does not have native support for floor on doubles.
v2 (Sam):
- Improve the lowering pass to remove one bcsel (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At least i965 hardware does not have native support for truncating doubles.
v2:
- Simplified the implementation significantly.
- Fixed the else branch, that was not doing what we wanted.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Move to compiler/nir (Iago)
v3: Use nir_imm_int() to load the constants (Sam)
v4 (Sam):
- Undo line-wrap (Jason).
- Fix comment (Jason).
- Improve generated code for get_signed_inf() function (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- Group num_components and bit_size together (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes a Coverity defect by adding checks to see if a value is negative
before using it to index an array. By checking the value first it makes
the code a bit safer but overall should not have a big impact.
CID: 1355598
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(Re-pushing previous fix for clang SVN r265359, which was reverted in
the meantime)
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Return statements in conditional blocks were not having their
output varyings lowered correctly.
This patch fixes the following piglit tests:
/spec/glsl-1.10/execution/vs-float-main-return
/spec/glsl-1.10/execution/vs-vec2-main-return
/spec/glsl-1.10/execution/vs-vec3-main-return
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Previously, these were functions which took a callback. This meant that
the per-block code had to be in a separate function, and all the data
that you wanted to pass in had to be a single void *. They walked the
control flow tree recursively, doing a depth-first search, and called
the callback in a preorder, matching the order of the original source
code. But since each node in the control flow tree has a pointer to its
parent, we can implement a "get-next" and "get-previous" method that
does the same thing that the recursive function did with no state at
all. This lets us rewrite nir_foreach_block() as a simple for loop,
which lets us greatly simplify its users in some cases. This does
require us to rewrite every user, although the transformation from the
old nir_foreach_block() to the new nir_foreach_block() is mostly
trivial.
One subtlety, though, is that the new nir_foreach_block() won't handle
the case where the current block is deleted, which the old one could.
There's a new nir_foreach_block_safe() which implements the standard
trick for solving this. Most users don't modify control flow, though, so
they won't need it. Right now, only opt_select_peephole needs it.
The old functions are reimplemented in terms of the new macros, although
they'll go away after everything is converted.
v2: keep an implementation of the old functions around
v3 (Jason Ekstrand): A small cosmetic change and a bugfix in the loop
handling of nir_cf_node_cf_tree_last().
v4 (Jason Ekstrand): Use the _safe macro in foreach_block_reverse_call
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes the OpenGLES 3.1 CTS:
* ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos
Because this is triggering the error message after the normal API
validation phase, we don't have the API function name available, and
therefore we generate an error message without the draw call name:
Mesa: User error: GL_INVALID_OPERATION in draw call (vertex buffers are mapped)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95142
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes some failures in dEQP-VK.api.info.image_format_properties.* and
enables the test group to execute without assert failing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94896
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Sampling from an ETC2 texture is supported on Bay Trail and
from Gen8 onwards. While ASTC_LDR is supported on Gen9, the
logic to handle such formats has not yet been implemented in
the driver.
Fixes dEQP-VK.api.info.format_properties.compressed_formats.
v2: Enable ETC2 for Bay Trail (Kenneth Graunke)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94896
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a validator that ensures that all expressions passed
through nir_algebraic are 100% non-ambiguous as far as bit-sizes are
concerned. This way it's a compile-time error rather than a hard-to-trace
C exception some time later.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is consistent with the rename done for the rest of NIR. Currently,
"bool" is the only type specifier used in nir_opt_algebraic.py so this is
really a no-op.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, if an exception was encountered anywhere, nir_algebraic would
just die in a fire with no indication whatsoever as to where the actual bug
is. This commit makes it print out the particular search-and-replace
expression that is causing problems along with the exception. Also, it
will now report all of the errors it finds and then exit at the end like a
standard C compiler would do.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The test was failing to build with "undefined reference to `roundf'" errors,
so Make check on mesa was failing.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes ES31-CTS.gtf.GL31Tests.texture_stencil8.texture_stencil8_multisample.
Current logic divides given layer of one by number of samples (four)
trashing the layer to zero. Layer adjustment is only to be used with
non-interleaved msaa surfaces where samples for particular layer are
in multiple slices.
I copy-pasted a bit of documentation from
brw_blorp.c::brw_blorp_compute_tile_offsets().
Also took the opportunity to fix the comment regarding sampling
as 2D, cube textures are the only exception.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
A semantic error was introduced in a past refactoring that caused the bind
parameter to be passed into the use_reusable_pool parameter of buffer_create.
Since this clearly makes no sense, and there is no clear reason why the
cache _shouldn't_ be used, just use the cache always.
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
A piglit test (arb_texture_multisample-stencil-clear) has been sent.
This problem was discovered analyzing
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93767
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Mostly for consistency with the other decompress functions, but note that
in the non-DCC decompress case, the function can now early-out in slightly
more (albeit probably rare) cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This happens to "fix" a rendering bug in KotOR2, because it avoids a still
not quite understood bug with MSAA fast stencil clear decompress. For the
stencil clear bug, I have sent a piglit test (arb_texture_multisample-stencil-clear).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93767
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are some undefined behavior subtleties, so having a function to match
the u_bit_scan_consecutive_range makes sense.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Turns out that this is needed after all to satisfy some strengthened
coherency tests. Depends on support in LLVM, added in r267729.
v2: updated to reflect changes to the LLVM intrinsic
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
BackendPixelRate should be easier to read/maintain now hopefully.
Small perf bump by moving some of the pfn's to inline functions
without template params.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>