Changes since v3: - correctly guard the x86 backend behind ARCH_X86_64 - remove an unnecessary restriction on the packed shuffle solver preventing it from being used for e.g. ya8 -> gray downconversion - fixed typo (max_elp -> max_ulp) - added new checkasm test to checkasm.mak