Commit Graph

117984 Commits

Author SHA1 Message Date
Ian Romanick ca353285cb nir/range_analysis: Make sure the table validation only occurs once
All of the tables are static const, so they only need to be validated
once.  As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass.  Even with
the help of the previous commit, this does not always occur.

-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 08:16:06 -08:00
Ian Romanick ccefce46cb nir/range-analysis: Add pragmas to help loop unrolling
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions.  It seems none of this happens even at -O3.

Adding the pragmas helps encourage loop unrolling at some optimization
levels.  I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.

-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5

v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 08:16:06 -08:00
Danylo Piliaiev 25a00b449f glsl: Add varyings to "zero-init of uninitialized vars" workaround
Varyings are similar to already handled cases. And "glsl_zero_init"
name of the workaround already looks like it should include varyings.

The issue was observed in GiMark subtest from GpuTest.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 15:25:56 +00:00
Alyssa Rosenzweig 4c43b354c3 pan/midgard: Use lower_tex_without_implicit_lod
Just a bit of cleanup. lower_tex can do this lowering for us, which
should also eliminate some special cases (one less thing to fix if we
ever need texturing in tess/geom/etc, perhaps?)

Closes #2133

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 08:38:57 -05:00
Christian Gmeiner 47c7c4263c etnaviv: use a more self-explanatory param name
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-22 10:47:13 +00:00
Christian Gmeiner a949fa9d5d etnaviv: drop not used config_out function param
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-22 10:47:13 +00:00
Samuel Pitoiset 6f7ec6ee39 gitlab-ci: reduce the number of scons build
It seems overkill to me to build scons 7x for every pipeline.
Scons is now build with the oldest llvm version in scons-old-llvm
and with the newest llvm version in scons.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 10:39:21 +01:00
Alyssa Rosenzweig 2e14fe6490 panfrost: Add lcra.c to Android.mk
This was forgotten.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig bda2bb31b1 pan/midgard: Enable LOD lowering only on buggy chips
T720 and earlier need this workaround, so check the quirk before
lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig 68c2c7962a pan/midgard: Describe quirk MIDGARD_BROKEN_LOD
Corresponds to errata #10471, applies to T6xx and T720. Fixed in T760.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig d32d4acf68 pan/midgard: Add LOD bias/clamp lowering
We fetch the info with the new intrinsic and lower with ALU ops for txl
instructions, which seemingly correspond to "TEXGRD" instructions (what
we call textureLod).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig 4e07e7b232 pan/midgard: Implement load_sampler_lod_paramaters_pan
We can stuff this information in as parametrized system values, like we
currently do texture size and SSBO addresses.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig deaebc82a7 nir: Add load_sampler_lod_paramaters_pan intrinsic
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Markus Wick b1156ecdf2 mapi/glapi: Generate sizeof() helpers instead of fixed sizes.
Generating a source code with a fixed size leads to issues with plattform dependent types.
We either hard code 4 or 8 bytes there, and both are wrong on the other plattform.
So this patch solves this issue by generating eg sizeof(GLsizeiptr), which is valid both
on 32 and on 64 bit plattforms.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-11-21 22:52:55 -05:00
Ian Romanick e51eda99df intel/fs: Disable conditional discard optimization on Gen4 and Gen5
The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of
valid data and 31 bits of junk.  Results of comparisons that are used as
Boolean values need to have a fixup applied to generate the proper 0/~0
values.

Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup
code from being generated.  This results in a sequence like:

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
(+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD     g8<8,8,1>UD

instead of

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
        or(16) g4<1>UD g4<8,8,1>UD     g8<8,8,1>UD
(+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD     1UD

I examined a couple of the shaders hurt by this change, and ALL of them
would have been affected by this bug. :(

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836
Fixes: 0ba9497e66 ("intel/fs: Improve discard_if code generation")

Iron Lake
total instructions in shared programs: 8122757 -> 8122957 (<.01%)
instructions in affected programs: 8307 -> 8507 (2.41%)
helped: 0
HURT: 100
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.58% 3.03%
Instructions are HURT.

total cycles in shared programs: 188510100 -> 188510376 (<.01%)
cycles in affected programs: 76018 -> 76294 (0.36%)
helped: 0
HURT: 55
HURT stats (abs)   min: 2 max: 12 x̄: 5.02 x̃: 4
HURT stats (rel)   min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56%
95% mean confidence interval for cycles value: 4.33 5.71
95% mean confidence interval for cycles %-change: 0.60% 1.12%
Cycles are HURT.

GM45
total instructions in shared programs: 4994403 -> 4994503 (<.01%)
instructions in affected programs: 4212 -> 4312 (2.37%)
helped: 0
HURT: 50
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.45% 3.07%
Instructions are HURT.

total cycles in shared programs: 128928750 -> 128928982 (<.01%)
cycles in affected programs: 67442 -> 67674 (0.34%)
helped: 0
HURT: 47
HURT stats (abs)   min: 2 max: 12 x̄: 4.94 x̃: 4
HURT stats (rel)   min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53%
95% mean confidence interval for cycles value: 4.19 5.68
95% mean confidence interval for cycles %-change: 0.50% 1.00%
Cycles are HURT.
2019-11-21 16:40:50 -08:00
Dylan Baker bba44ef176 docs: update calendar, add news item and link release notes for 19.2.6 2019-11-21 16:34:00 -08:00
Dylan Baker 3531d74e82 docs: Add SHA256 sum for 19.2.6 2019-11-21 16:32:35 -08:00
Dylan Baker f8070577a4 docs: Add release notes for 19.2.6 2019-11-21 16:32:34 -08:00
Marek Olšák 0b1452ffdd nir/serialize: do ctx = {0} instead of manual initializations
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-21 18:49:57 -05:00
Marek Olšák ff71fae440 nir: strip as we serialize to remove the nir_shader_clone call
Serializing stripped NIR is faster now.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-21 18:49:57 -05:00
Christian Gmeiner 8acaab1aa7 etnaviv: add drm-shim
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 22:56:04 +00:00
Eric Engestrom 609a6ae23e vk_util: drop duplicate formats in vk_format_map[]
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 22:52:40 +00:00
Jonathan Marek 773d640efa turnip: implement UBWC
This enables UBWC for everything except 3D textures.

It breaks many image_to_image copies but those aren't important and it can
be worked around later (image_to_image copy needs to be done in two steps,
decode from the source format and then encode to the destination format).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 22:21:57 +00:00
Jonathan Marek 91fd83d142 freedreno/regs: update UBWC related bits
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 22:21:57 +00:00
Vinson Lee 6613a4a029 swr: Fix build with llvm-10.0.
Fix build error after llvm-10.0 commit 1dfede3122ee ("Move
CodeGenFileType enum to Support/CodeGen.h").

../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp: In member function ‘void JitManager::DumpAsm(llvm::Function*, const char*)’:
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp:428:45: error: ‘CGFT_AssemblyFile’ is not a member of ‘llvm::TargetMachine’
             *pMPasses, filestream, nullptr, TargetMachine::CGFT_AssemblyFile);
                                             ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
2019-11-21 13:20:08 -08:00
Rhys Perry 29d131d619 aco: fix copy+paste error
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-21 20:28:57 +00:00
Rhys Perry d1b9deeea8 aco: improve waitcnt insertion around loops
Do this by repeating processing of loops until no progress is made.

Totals from affected shaders:
SGPRS: 162576 -> 162576 (0.00 %)
VGPRS: 145228 -> 145228 (0.00 %)
Spilled SGPRs: 668 -> 668 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15778640 -> 15771336 (-0.05 %) bytes
LDS: 146 -> 146 (0.00 %) blocks
Max Waves: 6087 -> 6087 (0.00 %)

v2: use block_kind_loop_header/block_kind_loop_exit to repeat at the end
    of loops instead of at each continue

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-21 20:28:57 +00:00
Rob Clark 1a8c49d76c freedreno/perfctrs/fdperf: periodically restore counters
When GPU is idle and suspends, the currently selected countables
will all reset to the first one.  So periodically restore the selected
countables.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-21 20:01:03 +00:00
Rob Clark 5a13507164 freedreno/perfcntrs: add fdperf
Port from the envytools tree, but converted to use the .c tables for
describing the perfcounter groups/countables, rather than using rnndec
to get this at runtime from the register xml.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-21 20:01:03 +00:00
Rob Clark b2338a5b00 freedreno/perfcntrs/a6xx: remove RBBM counters
Currently this are getting blocked by the kernel.. these counters don't
seem to be the most useful ones, and to use them we'd have to somehow
probe the kernel by submitting cmdstream to write the selector regs and
see if that triggers a GPU fault.  So let's just skip them.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-21 20:01:03 +00:00
Rob Clark 6a517b3079 freedreno/perfctrs/a2xx: move CP to be first group
fdperf expects this, to find the ALWAYS_COUNT counter

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-21 20:01:03 +00:00
Rob Clark e35c4e6ad2 freedreno/perfcntrs: add accessor to get per-gen tables
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-21 20:01:03 +00:00
Rob Clark b21f03ae7e freedreno/perfcntrs: move to shared location
This should eventually be useful for VK_KHR_performance_query as well.
And in the more near term, for fdperf.

Attempt to not break android build is best-effort and untested.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-21 20:01:03 +00:00
Rob Clark 6727114cba freedreno/perfcntrs: remove gallium dependencies
Prep work to move to a shared location.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-21 20:01:02 +00:00
Rob Clark 3fb6aaf42e freedreno/perfcntrs: small cleanup
When we had one gen supporting performance counters, it made sense to
have these builder macros in the .c file with the table.  But time has
come to de-duplicate.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-11-21 20:01:02 +00:00
Dave Airlie cce07ea835 nir: fix deref offset builder
Use the correct bit size

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:41 +10:00
Dave Airlie 7325f6ac98 vtn/opencl: add clz support
This is needed for OpenCL

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:41 +10:00
Dave Airlie e3b21dfcb1 nouveau: request ufind_msb64 lowering in the frontend.
This passes the piglit CL builtin-ulong-clz-1.0.generated.cl
test.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-11-22 04:37:41 +10:00
Dave Airlie d0d96053e6 nir: add 64-bit ufind_msb lowering support. (v2)
This adds the option to lower 64-bit ufind_msb opcodes.

v2: use split_x/y removes component loops (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:37 +10:00
Dave Airlie 12913bcf86 spirv/nir/opencl: handle some multiply instructions.
This adds support for some missing 24-bit and hi multiply
variants.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:25 +10:00
Dave Airlie 5375c30234 spirv: get the correct type for function returns.
This needs to be derived from the address format, not always 1/32.

Suggested by Jason

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:25 +10:00
Dave Airlie b62a925ad1 spirv: don't store 0 to cs.ptr_size for non kernel stages.
cs is a union so storing this there is wrong.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:25 +10:00
Jonathan Marek 1496e1164f util: add missing R8G8B8A8_SRGB format to vk_format_map
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 17:46:27 +00:00
Elie Tournier 72b44d148d docs: fix ascii html representation
v2 (Eric): Use more readable ascii version

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-11-21 16:51:18 +00:00
Elie Tournier 64d7bd96b8 Docs: remove duplicate meson docs for windows
This block is duplicated, we already have the windows instruction above.

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-11-21 16:51:18 +00:00
Eric Anholt dd76a6f198 ci: Move freedreno's parallelism to the runner instead of gitlab-ci jobs.
I set the runners to concurrency=1, so they serve only one gitlab-ci job
at at time.  Swap over to using the parallel runner now to keep the
runners busy, more efficiently than spawning many docker containers and
downloading artifacts multiple times, and producing easier-to-understand
results for browsing on the web.

This bumps the a306 runners to 4x parallel instead of 2x like before, but
cheza gles3 drops from 6 to 4.  Current rough timings of the jobs (if no
container download):

db410c-gles2: 5:00
a630-gles2: 1:30
a630-gles3: 6:00
a630-gles31: 5:30

a630-gles3 is a bit longer than I like, but it should come back down once
I can sort out the NIR algebraic rewinding.
2019-11-21 05:48:17 -08:00
Iago Toral Quiroga c573b50179 glsl: add missing initialization of the location path field
This was apparently missed in 67b32190f3, which added support
for ARB_shading_language_include to #line, including the 'path'
field for the location.

Fixes crashes in CTS with all drivers as they attempt to access
an uninitialized path string during parsing.

Fixes: 67b32190f3 ("glsl: add ARB_shading_language_include support to #line")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2132
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
2019-11-21 12:55:15 +01:00
Rhys Perry 1a0500cd04 docs: update features.txt for RADV
[skip ci]

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-21 11:00:50 +00:00
Michel Dänzer 32618ee719 gitlab-ci: Directly use host-mapped directory for ccache
Use hardcoded /cache/mesa/ccache for the cache, so it will be shared by
all jobs of all Mesa projects running on the same runner host. This
should increase the hit rate and decrease the worst case storage used.

Further benefits of directly using a host-mapped directory:

* Saves up to ~1 minute per job for restoring and saving the cache
  contents via the GitLab CI cache mechanism
* Cache contents generated by failed jobs are no longer lost
* Jobs running in parallel on the same runner host can get hits from
  each other

Also enable compression, so the default maximum cache size of 5G might
be sufficient.

v2:
* Move CCACHE_DIR variable to the .build-linux template

Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
2019-11-21 10:13:43 +01:00
Samuel Pitoiset 0d1085ac4a gitlab-ci: remove now useless meson-swr-glvnd build job
All things are already part of meson-main.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
2019-11-21 09:35:05 +01:00