Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We already do those checks in filter_tiling. There's no good reason to
repeat them in choose_msaa_layout. If anything they should have been
asserts and not "return false" checks. Also, this check was causing us to
outright reject multisampled HiZ surfaces which wasn't intended.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The HiZ and CCS tiling formats are always used for HiZ and CCS surfaces
respectively. There's no reason why we should go through filter_tiling and
it's much easier to always get HiZ and CCS right if we just handle them
directly.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
HiZ buffers can be multisampled and, on Broadwell and earlier, simply using
interleaved multisampling with a compression block size of 8x4 samples
yields the correct HiZ surface size calculations. Unfortunately,
choose_msaa_layout was rejecting multisampled HiZ buffers because of format
checks. Now that we have a simple helper for determining if a format
supports multisampling, that's an easy enough issue to fix.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Compressed 1-D textures are not well-defined thing in either GL or Vulkan.
However, auxiliary surfaces are treated as compressed textures in ISL and
we can do HiZ and CCS with 1-D so we need to be able to create them. In
order to prevent actually using them (the docs say no), we assert in the
state setup code.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The assertion that a format is uncompressed in the multisample layouts
isn't quite right. What we really want to assert is that the format
supports multisampling which is a bit more complicated query. We also want
to assert that it has a block size of 1x1 since we do nothing with the
block size in the phys_level0_sa assignment.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Previous fundamental change in stats gathering added a temporary
SwrWaitForIdle to begin_query and end_query. Code has been reworked to
remove stall.
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Create worker pool now computes number of worker threads based on
things like topologies, etc. and creates the pool but doesn't actually
launch the threads. Instead there is a separate start thread pool
function. This allows thread resources to be constructed first before
threads start.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- Immediately sleep threads until thread data is initialized
- Fix some compile bugs with AR enabled
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- Move most jitter functionality into SwrJit namespace
- Avoid global "using namespace llvm" in headers
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
gcc-4 and earlier don't allow compound literals where a constant
is required in -std=c99/gnu99 mode, so we can't use ISL_SWIZZLE()
when populating the anv_formats[] array. There are a few ways around
it: First one would be -std=c89/gnu89, but the rest of the code
depends on c99 so it's not really an option. The second option
would be to upgrade to gcc-5+ where the compiler behaviour was relaxed
a bit [1]. And the third option is just to avoid using compound
literals. I chose the last option since it keeps gcc-4 and earlier
working.
[1] https://gcc.gnu.org/gcc-5/porting_to.html
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes: 7ddb21708c ("intel/isl: Add an isl_swizzle structure and use it for isl_view swizzles")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a crash in egl-create-msaa-pbuffer-surface Piglit test
and same crash in many dEQP EGL tests.
I also found that some Qt example did a workaround because of this
crash: https://bugreports.qt.io/browse/QTBUG-47509
v2: Ian pointed out that v1 removed support for all multisample
configs, including window ones. This one removes pbuffer bit
when adding configs, now only pbuffer+msaa gets rejected and
window+msaa continues to work. Fixed also comment (Emil)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While the current CFG code is valid in the case where a switch break also
happens to be a loop continue, it's a bit suboptimal. Since hardware is
capable of handling the continue as a direct jump, it's better to use a
continue instruction when we can than to bother with all of the nasty
switch break lowering.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
It is possible that the break block of a switch is actually the continue of
the loop containing the switch. In this case, we need to identify the
break block as a continue and break out of current level of CFG handling.
If we don't, the continue portion of the loop will get handled twice, once
by following after the break and a second time by the loop handling code
handling it explicitly.
This fixes 6 of the new Vulkan CTS tests:
- dEQP-VK.spirv_assembly.instruction.graphics.opphi.out_of_order*
- dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order*
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
These structs will be written to disk as part of the shader cache
so use uint32_t just to be safe.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Almost all of the other drawing validation code is in api_validate.c
so put this function there as well.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Detect all of the CPUs in the system. Expose metrics
for min, max and current frequency in Hz.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Regression introduced by "gallium/radeon: zero all query buffers".
Cc: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is only needed for r600 which doesn't have ARB_query_buffer_object and
therefore wouldn't really need the fences, but let's be optimistic about
filling in this feature gap eventually.
Cc: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We don't plan to use sub-allocated buffers with UVD, but just in case one
slips through, this increases the chances of things working out anyway.
Reviewed-by: Christian König <christian.koenig@amd.com>
We don't plan to use sub-allocated buffers with VCE, but just in case one
slips through, this increases the chances of things working out anyway.
Reviewed-by: Christian König <christian.koenig@amd.com>
Implement support for power based sensors, reporting units in
milli-watts and watts.
Also, minor cleanup - change the related if block to a switch.
Tested with two different power sensors, including the nouveau
'power1' sensors on a GTX950 card.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commutativity is not allowed with SHLADD, but src2 can accept
loads. To allow the load propagation pass to do its job, add a
special case like for SUCLAMP because src1 is always an immediate.
This IMAD to SHLADD optimization helps a bunch of shaders from Tomb
Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow
Warrior.
GF100/GK104:
total instructions in shared programs :2838045 -> 2834712 (-0.12%)
total gprs used in shared programs :396684 -> 396386 (-0.08%)
total local used in shared programs :34416 -> 34416 (0.00%)
local gpr inst bytes
helped 0 326 1105 1105
hurt 0 55 3 3
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Only and only if src1 is a power of 2 we can replace IMAD by SHLADD.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Unfortunately, we can't use the emit helpers for GF100/GK110
because src1 and src2 are swapped.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This instruction is available since SM20 (Fermi) and allow to do
(a << b) + c in one shot. In some situations, IMAD should be
replaced by SHLADD when b is a power of 2, and ADD+SHL should be
replaced by SHLADD as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
envyas now uses a much better representation for those control
codes and it displays the different flags instead of an
unreadable hex number.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Most of the time, even the 512 bytes that we now get is more than sufficient
(pipeline stats queries are the largest at 184 bytes per shot).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>