The adjusted polynomial coefficients come from the numerical
minimization of the L2 norm of the relative error. The old
coefficients would give a maximum relative error of about 15000 ULP in
the neighborhood around acos(x) = 0, the new ones give a relative
error bounded by less than 2000 ULP in the same neighborhood.
SKL has a workaround which requires either some weird programming of buffer 3,
OR, just never using buffer 0. Since we don't actually use multiple constant
buffers, it's easier to just not use 0.
Only SKL requires this workaround, but there is no harm in applying it to all
platforms. The big change here is that buffer #0 is relative to dynamic state
base normally (depending upon ISTPM), where buffer 1-3 is a GPU virtual address.
Remove all the fine-grained cleanup in
anv_device_init_meta_clear_state(). Instead, if anything fails during
initialization, simply call anv_device_finish_meta_clear_state() and let
it clean up the partially initialized anv_meta_state::clear.
This handles multisample color images that have a floating-point or
normalized format and have a single array layer.
This does not yet handle integer formats nor multisample array images.
As far as I can tell, this patch sets all pipeline multisample state
except:
- alpha to coverage
- alpha to one
- the dispatch count for per-sample dispatch
The dEQP "precision" test tries to verify that the reference functions
float sinh(float a) { return ((exp(a) - exp(-a)) / 2); }
float cosh(float a) { return ((exp(a) + exp(-a)) / 2); }
float tanh(float a) { return (sinh(a) / cosh(a)); }
produce the same values as the built-ins. We simplified away the
multiplication by 0.5 in the numerator and denominator, and apparently
this causes them not to match for exactly 1 out of 13,632 values.
So, put it back in, fixing the test, but making our code generation
(and precision?) worse.
The extent previously was supposed to match the mip at a given level
under the assumption that the base address would be that of the mip
as well.
Now however, the base address only matches the offset of the
containing tile. Therefore, enlarge the extent to match that of
phys_slice0, so that we don't draw/fetch in out of bounds territory.
This solution isn't perfect because the base adress isn't always at
the first tile, therefore the assumed valid memory region by the HW
contains some number of invalid tiles on two edges.
Use a custom VkBufferImageCopy with the user-provided struct as
the base. A few fields are modified when the iview is uncompressed
and the underlying image is compressed.
When creating an uncompressed ImageView on an compressed Image, the
SurfaceFormat is updated to match the ImageView's. The surface
dimensions must also change so that the HW sees the same size image
instead of a 4x larger one.
Fixes the following error which results from running many VulkanCTS
compressed tests in one shot:
ResourceError (vk.queueSubmit(queue, 1, &submitInfo, *m_fence):
VK_ERROR_OUT_OF_DEVICE_MEMORY at
vktPipelineImageSamplingInstance.cpp:921)
Makes all compressed format tests with a height > 1 pass.
Aligns with formula's presented in Vulkan spec concerning CopyBufferToImage.
18.4 Copying Data Between Buffers and Images
This won't conflict with valid API usage, because:
1) Users are not allowed to create an uncompressed ImageView with a
compressed Image.
see: VkSpec - 11.5 Image Views - VkImageViewCreateInfo's Valid Usage box
2) If users create a differently formatted compressed ImageView with a
compressed Image, the block dimensions will still match.
see: VkSpec - 28.3.1.5 Format Compatibility Classes - Table 28.5
For an uncompressed ImageView of a compressed Image, the
dimensions and offsets are all divided by the appropriate
block dimensions.
We are not yet using an uncompressed ImageView for a
compressed Image, but will do so in a future commit.
Enable ETC support for BDW+. In Vulkan, an array lookup on
surface_format[] is used to determine HW support for certain
formats. In contrast, Mesa dynamically populates an array
which reports this information.
3D surfaces in Skylake are stored with ISL_DIM_LAYOUT_GEN4_2D. Any
delta in the logical z offset causes an equivalent delta in the
surface's array layer.
Test isl_surf_get_image_intratile_offset_el() in the tests:
test_bdw_2d_r8g8b8a8_unorm_512x512_array01_samples01_noaux_tiley0
test_bdw_2d_r8g8b8a8_unorm_1024x1024_array06_samples01_noaux_tiley0
When calculating row pitch, the row's width in samples must be divided
by the format's block width. The commit below accidentally removed the
division.
commit eea2d4d059
Author: Chad Versace <chad.versace@intel.com>
Date: Tue Jan 5 14:28:28 2016 -0800
Subject: isl: Don't align phys_slice0_sa.width twice
The if/then/else block was bogus, as it can only take a scalar
condition, and we need to select component-wise. The GLSL IR
implementation of atan2 handles this by looping over components,
but I decided to try and do it vector-wise, and messed up.
For now, just bcsel. It means that we do the atan1 math even if
all components hit the quick case, but it works, and presumably
at least one component will hit the expensive path anyway.
We were botching this for negative numbers - floor of a negative rounds
the wrong way. Additionally, both results are supposed to retain the
sign of the original.
To fix this, just take the abs of both values, then put the sign back.
There's probably a better way to do this, but this works for now.